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INTRODUCTION 


Research simply means a search for facts — answer to questions and solutions to 
problems. It is a purposive investigation. It is an organized inquiry. It seeks to find 
explanations to unexplained phenomena to clarify doubtful facts and to correct the 
misconceived notions. 


Research is a scientific endeavour and involves the scientific method. The 
scientific method is a systematic step-by-step procedure following the logical 
processes of reasoning. Scientific method is a means for gaining knowledge of the 
universe. It does not belong to any particular body of knowledge; it is universal. It 
does not refer to a field of specific subject of matter, but rather to a procedure or 
mode of investigation. 


Research methodology refers to the procedures used in making systematic 
observations or otherwise obtaining data, evidence, or information as part ofa 
research project or study. It defines what the activity of research is, how to proceed, 
how to measure progress, and what constitutes success. 


This book, Research Methodology, is written with the distance learning 
student in mind. It is presented in a user-friendly format using a clear, lucid language. 
Each unit contains an Introduction and a list of Objectives to prepare the student 
for what to expect in the text. At the end of each unit are a Summary and a list of 
Key Words, to aid in recollection of concepts learnt. All units contain Self- 
Assessment Questions and Exercises, and strategically placed Check Your Progress 
questions so the student can keep track of what has been discussed. 
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NOTES 


UNIT 1 INTRODUCTION TO 
RESEARCH 


Structure 
1.0 Introduction 
1.1 Objectives 
1.2 Meaning of Research 
1.3 Types of Research 
1.3.1 Exploratory Research 
1.3.2 Conclusive Research 
1.4 The Process of Research 
1.5 Research Applications in Social and Business Sciences 
1.6 Features of a Good Research Study 
1.7 Answers to Check Your Progress Questions 
1.8 Summary 
1.9 Key Words 
1.10 Self Assessment Questions and Exercises 
1.11 Further Readings 


1.0 INTRODUCTION 


You might have watched on TV the panel discussion that takes place before the 
start of the cricket match. The facilitator asks the panel members questions like: 


e Which side will win the match today? 
e Will Sachin Tendulkar score a century? 
e What will be the score that the batting side will pile? 


You have noted that to answer these questions, the panel members quote factors 
such as the following: 


e The outcome of previous instances when the two sides met and the winning 
streak of the teams at the venue 


e The number of centuries Tendulkar has scored on a particular ground and 
against the opposite side 

e Weather conditions, etc. 
What the panel members are doing is that they are using the existing evidence 


or data systematically to make match predictions. In other words, we could say 
that they are using research methodology to answer the questions. 
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Research methodology refers to the procedures used in making systematic 
observations or otherwise obtaining data, evidence, or information as part ofa 
research project or study. It defines what the activity of research is, how to proceed, 
how to measure progress, and what constitutes success. We will study more about 
the various aspects of research methodology in the unit. First, let us understand 
what research is. 


Research helps in decision making, especially in business. Effective decisions 
lead to managerial success, and this requires reducing the element of risk and 
uncertainty. For example, let us say, an ice-cream company has come up witha 
new flavour of ice-cream, which is a mixture of mango and vanilla. They are thinking 
of two names —‘Aam Masti’ or ‘Mango Mania’. They would like to sell the ice- 
cream to children and are not sure which name has more appeal. One of the ways 
in which this can be done is by using the scientific method of enquiry and following 
a structured approach to collect and analyse information and then eventually subject 
it to the manager’s judgement. This is no magic mantra but a scientific and structured 
tool available to every manager, namely, research. Thus, research refers to a wide 
range activities involving a search for information, which is used in various disciplines. 


Research activities may range from a simple collection of facts (example, the 
number of MBA students who opt for higher studies abroad in a particular institute) 
to validation of information (for example, is the new diet cola more popular among 
women?) to an exhaustive theory and model construction (for example, constructing 
a model of India’s weather patterns in 2050 based on climate change projections). 


In this unit, we will discuss the meaning of research, the types of researches 
available to the researcher and the process of a research study. We will also 
discuss the application of research in different areas of management and describe 
the features of a good research study. 


1.1 OBJECTIVES 


After going through this unit, you will be able to: 


e Define the concept of research in management 


Identify the types of researches available to a business researcher 
e Describe the complete process of a research study 
e Explain the application of research in different domains of Management 


e Identify the criteria needed to classify research as meaningful and ‘good’ 
research 


1.2 MEANING OF RESEARCH 


Different scholars have interpreted the term ‘research’ in many ways. For instance, 
Fred Kerlinger (1986) stated that ‘Scientific research is a systematic, controlled 


and critical investigation of propositions about various phenomena.’ Grinnell (1993) 
has simplified the debate and stated ‘The word research is composed of two 
syllables, ‘re’ and ‘search’. 


The dictionary defines the former as a prefix meaning ‘again’, ‘anew’ or 
“over again’. Search is defined as a verb meaning ‘to examine closely and carefully’, 
‘to test and try’, or ‘to probe’. Together, they form a noun describing a careful, 
systematic, patient study and investigation in some field of knowledge, undertaken 
to establish facts or principles.’ 


Thus, drawing from the common threads of the above definitions, we derive 
that management research is an unbiased, structured, and sequential method 
of enquiry, directed towards a clear implicit or explicit business objective. 
This enquiry might lead to proving existing theorems and models or arriving at new 
theories and models. Let us now understand each part of the definition. 


The most important and difficult task ofa researcher is to be as objective 
and neutral as possible. Even though the researcher might have a lot of knowledge 
about the topic, he/she must not try to deliberately get results in the direction of the 
hypotheses. 


The second thing to be remembered is that you follow a structured and 
sequential method of enquiry. For example, you may want to look at what are the 
options that you can choose if you study abroad. And you search the internet and 
ask your relatives and friends about what are the options for studying abroad. This 
is search and not research. For research, there must be a structured approach that 
you need to follow, and then only will it be called scientific. Thus, you may do a 
background analysis of how many students go abroad to study and based on this, 
form a hypotheses that 80 per cent of young Indians go to universities in the USA 
for further study. Then, you conduct a small survey amongst the students who are 
intending to go abroad for study. And based on the data collected, you are able to 
prove or disprove the hypotheses. So, we can state that you had conducted a 
research study. You will study the process of research later in the Unit. 


The last and most important aspect of our definition that needs to be carefully 
considered is the decision-assisting nature of business research. As Easterby- 
Smith, et al. (2002) state, business research must have some practical consequences, 
either immediately, when it is conducted for solving an immediate business problem 
or when the theory or model developed can be implemented and tested in a 
business setting. The world of business demands that managers and researchers 
work towards a goal—whether immediate or futuristic, else the research loses its 
significance in the field of management. The advantage with doing research is that 
one is able to take a decision with more confidence as one has tested it through 
research. For example, if you conduct a study of young women professionals and 
see that they have a need for a night crèche facility when they need to go out of 
town on official duty. You can conduct a small research to test what facilities they 
would like in this crèche and how much would they be willing to pay for this 
facility. 
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In fact, it would not be wrong to say that without the tool of research there 
would be no new business practices or methods, as no one would want to start 
something new (for example, launch a new product, enter anew market segment, 
etc) without testing it through research. 


Check Your Progress 


1. State the most important and difficult task ofa researcher. 


2. What method of enquiry is required in research? 


1.3 TYPES OF RESEARCH 


Though every research conducted is unique, it is possible to categorize the research 
approach that you may decide to take. Figure 1.1 summarizes the types of research. 


Business Research 


Basic Research Applied Research 


Exploratory Research Conclusive Research 


Descriptive Research Causal Research 


Fig. 1.1 Types of Research 


Sometimes, research may be done for a purely academic reason of a need to 
know. For example, studies on employee dissatisfaction and attrition led to the 
study of impact of fixed working hours on family life and responsibilities. This 
study led to the organizations realizing that they need to have flexible woring hours 
so that employees can better manage their work-life balance. The context of this 
kind of study is vast and time period, flexible. This type of research is termed as 
fundamental or basic research. On the other hand, you have studies that are 
specific to a particular business decision. For example, you find that despite being 
such an affordable car the Tata Nano does not find a large number of buyers. 


Thus, the study you undertake would be of practical value to the specific 
organization. Secondly, it has implications for immediate action. This action-oriented 
research is termed as applied research. 


However, now we would like to advise you not to look at the two as 
opposites of each other. It may happen that the research which started as applied 
might lead to some fundamental and basic research, which expands the body of 
knowledge or vice versa. The process followed in both basic and applied research 
is systematic and scientific; the difference between them could simply be a matter 
of context and purpose. 


Research studies can also be classified based on the nature of enquiry or 
objectives. Based on the nature of enquiry or objectives, research can be of the 
following types. 


e Exploratory research 


e Conclusive research 
1.3.1 Exploratory Research 


As the name suggests, exploratory research is used to gain a deeper understanding 
of the issue or problem that is troubling the decision maker. The idea is to provide 
direction to subsequent and more structured and rigorous research. The following 
are some examples of exploratory research: 


e Let us say a diet food company wants to find out what kind of snacks 
customers like to eat and where they generally buy health food from. 


e A reality show producer wants to make a show for children. He would like 
to know what kind of shows children like to watch. 


e There is an investment bank that would like to know from its customers 
about what kind of help they want from the bank while making their 
investments. 


As canbe seen, for the examples above an informal exploratory study would 
be needed.Exploratory research studies are less structured, more flexible in 
approach and sometimes could lead to some testable hypotheses. Exploratory 
studies are also conducted to develop the research questionnaire. (These will be 
discussed in detail in Unit 3.) The nature of the study being loosely structured 
means the researcher’s skill in observing and recording all possible information 
will increase the accuracy of the findings. 


1.3.2 Conclusive Research 


Conclusive research is carried out to test and validate the study hypotheses. In 
contrast to exploratory research, these studies are more structured and definite. 
The variables and constructs in the research are clearly defined. For example, 
finding customer satisfaction levels of heavy consumers of different pizzas in the 
Pizza Hut menu. Now, this needs clear definition of customer satisfaction; secondly, 
how we will identify heavy consumers. The timeframe of the study and respondent 
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selection are more formal and representative. The emphasis on reliability and validity 
of the research findings are all the more significant, as the results might need to be 
implemented. 


Based on the nature of investigation required, conclusive research can further 
be divided into the following types: 


e Descriptive research 


e Causal research 
Descriptive research 


The main goal of descriptive research is to describe the data and characteristics 
about what is being studied. The annual census carried out by the Government of 
India is an example of descriptive research. The census describes the number of 
people living in a particular area. It also gives other related data about them. It is 
contemporary and time-bound. Some more examples of descriptive research are 
as follows: 


e A study to distinguish between the characteristics of the customers who 
buy normal petrol and those who buy premium petrol. 


e A study to find out the level of involvement of middle level versus senior 
level managers in a company’s stock-related decisions 


e As study on the organizational climate in different organizations. 


All the above research studies are conducted to test specific hypotheses 
and trends. For example we might hypotheses that the level of involvement of 
senior level managers is higher than middle level managers in stock-related decisions. 
They are more structured and require a formal, specific and systematic approach 
to sampling, collecting information and testing the data to verify the research 
hypotheses. 


Causal research 


Causal research studies explore the effect of one thing on another and more 
specifically, the effect of one variable on another. For example, ifa fast-food outlet 
currently sells vegetarian fare, what will be the impact on sales if the price of the 
vegetarian food is increased by 10 per cent. Causal research studies are highly 
structured and require a rigid sequential approach to sampling, data collection and 
data analysis. This kind of research, like research in pure sciences, requires 
experimentation to establish causality. In majority of the situations, it is quantitative 
in nature and requires statistical testing of the information collected. 


Other types of research 


e Diagnostic research: It is just like descriptive research but with a different 
focus. It is aimed towards in depth approaches to reach the basic casual 
relations of a problem and possible solutions for it. Prior knowledge of the 


problem is required for this type of research. Problem formulation, defining 
the population correctly for study purposes, proper methods for collecting 
accurate information, correct measurement of variables, statistical analysis 
and tests of significance are essential in diagnostic research. 


Historical: Historical research studies the social effects of the past that 
may have given rise to current situations, i.e., past incidents are used to 
analyse the present as well as the future conditions. The study of the current 
state of Indian labour based on past labour union movements in the Indian 
economy to formulate the Indian Labour Policy is an example of this type 
of research. 


Formulative: It helps examine a problem with suitable hypothesis. This 
research, on social science, is mainly significant for clarifying concepts and 
innovations for further researches. The researchers are mainly concerned 
with the principles of developing hypothesis and testing with statistical tools. 


Experimental: The experimental type of research enables a person to 
calculate the findings, employ the statistical and mathematical devices and 
measure the results thus quantified. 


Ex post facto: This type of research is the same as experimental research, 
which is conducted to deal with the situations that occur in or around an 
organization. Examples of such a research are market failure of an 
organization's product being researched later and research into the causes 
for a landslide in the country. 


Case study: This method undertakes intensive research that requires a 
thorough study of particular chapter. 


Cross-sectional Research: This type of research is undertaken after data 
is gathered once, during a period of days, weeks or months. Many cross- 
sectional studies are exploratory or descriptive in purpose. They are designed 
to look at how things are now, without any sense of whether there is a 
history or trend at work. 


e Action research: It refers to research that improves the quality of action in 


the social world. 


Policy-Oriented Research: Reports employing this type of research focus 
on the question 'How can problem 'X' be solved or prevented?" 


. Which type of researched is especially carried out to test and validate the 


. Census is an example of which type of research? 


Check Your Progress 


study hypotheses? 
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1.4 THE PROCESS OF RESEARCH 


While conducting research, information is gathered through a sound and scientific 
research process. Each year, organizations spend enormous amounts of money 
on research and development in order to maintain their competitive edge. Thus 
we propose a broad framework that can be easily be followed in most researches. 
The process of research is interlinked at every stage as shown in Figure 1.2. 


Figure 1.2 illustrates a model research process. 


Management Dilemma 
Basic vs Applied 


Defining the Research Problem 


Formulating the Research Hypothesis 


Developing the Research Proposal 


The Research Framework 
Research Design 


Data Collection Plan Sampling Plan 


Instrument Design 


TT 


Pilot Testing 


qa 


Data Collection 


| 


Data Refining and Preparation 


Data Analysis and Interpretation 


Research Reporting 


Management/Research Decision 


Fig. 1.2 The Process of Research 


In the following paragraphs we will briefly discuss the steps that, in general, 
any research study might follow: 


The management dilemma 


Any research starts with the need and desire to know more. This is essentially the 
management dilemma. It could be the researcher himself or herself or it could be a 
business manager who gets the study by done by a researcher. The need might be 
purely academic (basic or fundamental research) or there might be an immediate 
business decision that requires an effective and workable solution (applied 
research). 


Defining the research problem 


This is the first and the most critical step of the research journey. For example, a 
soft drink manufacturer who is making and selling aerated drinks now wants to 
expand his business. He wants to know whether moving into bottled water would 
be a better idea or he should look at fruit juice based drinks. Thus, a comprehensive 
and detailed survey of the bottled water as well as the fruit juice market will have 
to be done. He will also have to decide whether he wants to know consumer 
acceptance of a new drink. Thus, there has to be complete clarity in the mind of 
the researcher regarding the information he must collect. 


Formulating the research hypotheses 


In the model, we have drawn broken lines to link defining the research problem 
stage to the hypotheses formulation stage. The reason is that every research study 
might not always begin with a hypothesis; in fact, the task of the study might be to 
collect detailed data that might lead to, at the end of the study, some indicative 
hypotheses to be tested in subsequent research. For example, while studying the 
lifestyle and eating-out behavior of consumers at Pizza Hut, one may find that the 
young student group consume more pizzas. This may lead to a hypotheses that 
young consumers consume more pizzas than older consumers. 


Hypothesis is, in fact, the assumptions about the expected results of the 
research. For example, in the above example of work-life balance among women 
professionals, we might start with a hypothesis that higher the work-family conflict, 
higher is the intention to leave the job. We will discuss the conversion of the defined 
problem into working hypotheses in Unit 2. 


Developing the research proposal 


Once the management dilemma has been converted into a defined problem and a 
working hypothesis, the next step is to develop a plan of investigation. This is 
called the research proposal. The reason for its placement before the other stages 
is that before you begin the actual research study in order to answer the research 
question you need to spell out the research problem, the scope and the objectives 
of the study and the operational plan for achieving this. The proposal is a flexible 
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contract about the proposed methodology and once it is made and accepted, the 
research is ready to begin. The formulation of a research proposal, its types and 
purpose will be explained in the next unit. 


Research design formulation 


Based on the orientation of the research, i.e., exploratory, descriptive or causal, the 
researcher has a number of techniques for addressing the stated objectives. These 
are termed in research as research designs. The main task of the design is to explain 
how the research problem will be investigated. There are different kinds of designs 
available to you while doing a research. These will be discussed in detail in Unit 3. 


Sampling design 


It is not always possible to study the entire population. Thus, one goes about 
studying a small and representative sub-group of the population. This sub-group is 
referred to as the sample of the study. There are different techniques available for 
selecting the group based on certain assumptions. The most important criteria for 
this selection would be the representativeness of the sample selected from the 
population under study. 


Two categories of sampling designs available to the researcher are probability 
and non-probability. In the probability sampling designs, the population under study 
is finite and one can calculate the probability of a person being selected. On the 
other hand, in non-probability designs one cannot calculate the probability of 
selection. The selection of one or the other depends on the nature of the research, 
degree of accuracy required (the probability sampling techniques reveal more 
accurate results) and the time and financial resources available for the research. 
Another important decision the researcher needs to take is to determine the best 
sample size to be selected in order to obtain results that can be considered as 
representative of the population under study. We will learn more about this in unit 
T 


Planning and collecting the data for research 


In the model (Figure 1.2), we have placed planning and collecting data for 
research as proceeding simultaneously with the sampling plan. The reason for 
this is that the sampling plan helps in identifying the group to be studied and the 
data collection plan helps in obtaining information from the specified population. 
The data collection methods may be classified into secondary and primary data 
methods. Primary data is original and collected first hand for the problem under 
study. There are a number of primary data methods available to the researcher 
like interviews, focus group discussions, personal/telephonic interviews/mail 
surveys and questionnaires. 


Secondary data is information that has been collected and compiled earlier 


for some other problem or purpose. For example, company records, magazine 
articles, expert opinion surveys, sales records, customer feedback, government 


data and previous researches done on the topic of interest. This step in the research 
process requires careful and rigorous quality checks to ensure the reliability and 
validity of the data collected. 


Data refining and preparation for analysis 


Once the data is collected, it must be refined and processed in order to answer the 
research question(s) and test the formulated hypotheses (ifany). This stage requires 
editing of the data for any omissions and irregularities. Then it is coded and tabulated 
in a manner in which it can be subjected to statistical testing. In case of data that is 
subjective and qualitative, the information collected has to be post coded i.e. after 
the data has been collected. 


Data analysis and interpretation of findings 


This stage requires selecting the analytical tools for testing the obtained information. 
There are anumber of statistical techniques available to the researcher—frequency 
analysis, percentages, arithmetic mean, t-test and chi-square analysis. These will 
be explained in the later units. 


Once the data has been analysed and summarized, linking the results with 
the research objectives and stating clearly the implications of the study is the most 
important task of the researcher. 


The research report and implications for the manager’s dilemma 


The report preparation, from the problem formulation to the interpretation, is the 
final part of the research process. As we stated earlier, business research 
is directed towards answering the question ‘what are the implications for the 
corporate world?’ Thus, in this step, the researcher’s expertise in analysing, 
interpreting and recommending, is very important. This report has to give complete 
details about everything that was done right from problem formulation, to the 
methodology followed to the conclusions inding of the study. The nature of the 
report may be different depending on whether it is meant for a business person or 
is an academic report. This will be discussed in detail in Unit 13. 


Check Your Progress 


5. What is another name for the process of ‘developing a plan of investigation’? 


6. List some of the primary data methods available to the researcher. 


1.5 RESEARCH APPLICATIONS IN SOCIAL AND 
BUSINESS SCIENCES 


Research is a crucial element in the area of business. It helps the decision maker to 
identify new opportunities for business growth. Research provides information 
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about various aspects of business, like product life cycle, consumer behaviour, 
market opportunities and threats, technological changes, social changes, economic 
changes, environmental changes, and so on, which are important for any decision 
maker to run the business smoothly. 


Research is crucial in the following areas of business: 


e Marketing function - Research is the lifeline in the field of marketing, 
where it is carried out on a vast array of topics and is conducted both in- 
house by the organization itself and outsourced to external agencies. This 
could be related to the 4 Ps- product, price, place and promotions. 


Personnel and human resource management - Human resources (HR) 
and organizational behaviour is an area which involves basic or fundamental 
research as a lot of academic, macro-level research may be adapted and 
implemented by organizations into their policies and programmes. 


Financial and accounting research - The area of financial and accounting 
research is quite vast and includes asset pricing, corporate finance and capital 
markets, market-based accounting research, modelling and forecasting in 
volatility, risk, etc. 


Production and operations management - This area of management is 
one in which research results are implemented, taking on huge cost and 
process implications. Research in this area relates to operation planning, 
demand forecasting, process planning, project management, supply chain 
management, quality assurance and management. 


Research in social science includes an in-depth study and evaluation of 
human behavior by using scientific methods in either quantitative or qualitative 
manner. As social science is concerned with the study of society and human behavior, 
it is important for a business organization in terms of understanding their customers, 
their taste, needs, preferences, lifestyle and their behaviour. New products or 
services are unlikely to succeed without proper consumer studies and survey. 


1.6 FEATURES OF A GOOD RESEARCH STUDY 


In the above sections we learnt that research studies can vary from the loosely 
structured method based on observations and impressions to the strictly scientific 
and quantifiable methods. However, for a research to be of value, it must possess 
the following characteristics: 


(a) It must have a clearly stated purpose. This not only refers to the objective 
of the study, but also precise definition of the scope and domain of the 
study. 


(b) It must follow a systematic and detailed plan for investigating the research 
problem. The systematic conduction also requires that all the steps in the 
research process are interlinked and follow a sequence. 


(c) The selection of techniques of collecting information, sampling plans and 


data analysis techniques must be supported by a logical justification about 
why the methods were selected. 


(d) The results of the study must be presented in an unbiased, objective and 


neutral manner. 


(e) The research at every stage and at any cost must maintain the highest ethical 


standards. 


(f) And lastly, the reason for a structured, ethical, justifiable and objective 


approach is the fact that the research carried out by you must be replicable. 
This means that the process followed by you must be ‘reliable’, i.e., in case 
the study is carried out under similar conditions it should be able to reveal 
similar results. 


. What are demand forecasting, and quality assurance and management a 


. What does the replicability of a research mean? 


Check Your Progress 


part of? 


1.7 


ANSWERS TO CHECK YOUR PROGRESS 
QUESTIONS 


. The most important and difficult task ofa researcher is to be as objective 


and neutral as possible. 


. Research always requires a structured and sequential method of enquiry. 


3. Conclusive research is especially carried out to test and validate the study 


hypotheses. 


4. Census is an example of descriptive research. 


5. Research proposal is another name for the process of ‘developing a plan of 


investigation.’ 


. Some of the primary data methods available to the researcher include 


interviews, focus group discussions, personal/telephonic interviews/mail 
surveys and questionnaires. 


. The demand forecasting, and quality assurance and management are a part 


of production and operations management. 


. The replicability of a research means that the process followed by you must 


be ‘reliable’, i.e., in case the study is carried out under similar conditions it 
should be able to reveal similar results. 
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1.8 SUMMARY 


e Research is a tool, of special significance in all areas of management. It can 
NOTES be defined as an unbiased, structured, and sequential method of enquiry, 
directed towards a clear implicit or explicit business objective. This enquiry 
might lead to proving existing postulates or arriving at new theories and 
models. 


Research may be done for a purely academic reason of a need to know 
(fundamental or basic research) or it could be undertaken as it would be of 
practical value to an organization with implications for immediate action 
(applied research). 


Based on the nature of enquiry or the objective, research can be exploratory 
or conclusive research. 


Conclusive research can be of two types—descriptive or causal studies. 


A research study usually follows a structured sequence of steps: 
o Developing and defining the research problem 

Formulating the study hypothesis 

Developing the study plan or proposal 

Identifying the research design 

Designing the sampling approach 

Conceptualizing and developing the data collection plan 
Executing data analysis 


Working out data inference and conclusions 


O-O On Oe O OO O 


Compiling and preparing the research report 


Different kinds of studies are carried out in the area of business manage- 
ment such as marketing, finance, human resources and operations. Each 
having their own orientation and approach. 


For a research to be recognized as significant, it must follow some basic 
criteria — clearly stated purpose; a systematic and detailed plan; logical 
justification for the selection of techniques of collecting information, sampling 
plans and data analysis techniques; unbiased, objective and neutral results; 
ethical standards; sequential and replicable. 


1.9 KEY WORDS 


Applied research: Studies that are related to specific problems and are 
conducted to find solutions. 


Basic research: Studies that are conducted for academic reasons and do 
not have immediate applicability. 
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e Causal research: These studies need experimentation and study the cause Zntroduction to Research 
and effect relationship. 


e Conclusive research: More structured studies conducted to test or validate 
the study hypotheses. 


e Descriptive research: Conclusive studies that describe the phenomena, 
group or situation under study. 


NOTES 


e Exploratory research: Loosely structured studies carried to gain a deeper 
understanding about something. 


e Hypothesis: A tentative assumption made in order to draw out and test its 
logical or empirical consequences; the assumptions about the expected results 
of a research. 


e Postulate: Something taken as true or factual and used as the starting 
point for a course of action. 


1.10 SELF ASSESSMENT QUESTIONS AND 
EXERCISES 


Short-Answer Questions 


1. How would you define business research? Illustrate with examples. 
2. Distinguish between descriptive and causal research studies. 
3. What are the features of a good research study? 
Long-Answer Questions 
1. What are the different types of researches that can be conducted by a 
researcher? 
2. Describe in detail the steps to be carried out in a typical research study. 


3. Can research be carried out in all areas of business? Explain with examples 
about the kind of studies that can be done. 
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UNIT 2 RESEARCH PROBLEM AND 
FORMULATION OF THE 
RESEARCH HYPOTHESES 


Structure 
2.0 Introduction 
2.1 Objectives 
2.2 Defining the Research Problem 
2.3 Management Decision Problem vs Management Research Problem 
2.4 Problem Identification Process 
2.5 Components of the Research Problem 
2.6 Formulating the Research Hypotheses 
2.6.1 Types of Research Hypotheses 
2.7 Writing a Research Proposal 
2.7.1 Contents of a research proposal 
2.7.2 Types of Research Proposals 
2.8 Answers to Check Your Progress Questions 
2.9 Summary 
2.10 Key Words 
2.11 Self Assessment Questions and Exercises 
2.12 Further Readings 


2.0 INTRODUCTION 


In the last unit, you were introduced to the meaning of research as well its types, 
process and features. In this unit, we will focus on the research problem and the 
formulation of the research hypothesis. The most important aspect of the business 
research method is to identify the ‘what’, i.e., what is the exact research question 
to which you are seeking an answer. The second important thing is that the process 
of arriving at the question should be logical and follow a line of reasoning that can 
lend itself to scientific enquiry. This reasoning approach needs to be converted 
into a possible research question. And based on the initial study of the research 
topic, you should be able to make certain assumptions which can lend direction to 
the study as research hypotheses. 


Thus in this unit, we will understand how to identify a problem that can be 
subjected to research and help us reduce decision risks. This will follow a structured 
and logical path to help us arrive at the research problem. Next. we will learn how 
to convert this research question into research hypotheses. The conduct of a 
research study usually requires that you write the steps you will take to do the 
study in the form of a proposal. We will end the unit by understanding how one 
writes a research proposal. 


2.1 OBJECTIVES 


After going through this unit, you will be able to: 
e Explain the business decision problem 


e Translate the decision needs into clearly spelt research questions 


Describe the method to be followed to arrive at the research questions 


e List the components ofa research problem 


Translate the research questions into research hypotheses depending on the 
nature of research 


e Prepare a research proposal 


2.2 DEFINING THE RESEARCH PROBLEM 


The challenge for a business manager is not only to identify and define the decision 
problem; the bigger challenge is to convert the decision into a research problem 
that can lead to a scientific enquiry. As Powers et al. (1985) have put it, ‘Potential 
research questions may occur to us on a regular basis, but the process of formulating 
them in a meaningful way is not at all an easy task’. One needs to narrow down the 
decision problem and rephrase it into workable research questions. 


Thus, the first and the most important step of the research process is like 
the start ofa journey, in this instance the research journey, and the identification of 
the problem gives an indication of the expected result . A research problem can be 
defined as a gap or uncertainty in the decision makers’ existing body of knowledge 
which inhibits efficient decision making. Sometimes it may so happen that there 
might be multiple alternative paths one can take and we will have to select which 
of these we would like to consider as the problem to be studied. As Kerlinger 
(1986) states, ‘If one wants to solve a problem, one must generally know what 
the problem is. It can be said that a large part of the problem lies in knowing what 
one is trying to do.’ The defined research problem might be classified as simple or 
complex. Simple problems are those that are easy to understand and the components 
and identified relationships are linear, e.g., the relationship between cigarette smoking 
and lung cancer. Complex problems on the other hand, deal with the interrelationship 
between multiple variables, e.g., the impact of social networking sites like Facebook 
and online shopping sites like Flipkart on consumer purchase behaviour in shops 
and markets. The impact might also further differ in terms of males and females. 
Other influencing factors on the buying behaviour could be a person’s lifestyle, 
age and education. Complex problems such as these deal with multiple variables. 
Thus, they require a model or framework to be developed to define the research 
approach. 
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2.3 MANAGEMENT DECISION PROBLEM VS 
MANAGEMENT RESEARCH PROBLEM 


The problem recognition process starts when the decision maker faces some 
difficulty or decision dilemma. Sometimes, this might be related to actual and 
immediate difficulties faced by the manager (applied research) or gaps experienced 
in the existing body of knowledge (basic research). The broad decision problem 
has to be narrowed down to information-oriented problem, which focuses on the 
data or information required to arrive at any meaningful conclusion. Given in Table 
2.1 is a set of decision problems and the subsequent research problems that might 
address them. Please remember these are only indicative questions and there could 
be many more ways of arriving at an answer to the decision problem. Secondly, it 
is not essential that the decision maker will always go in for research as he may 
arrive at a decision without research also. Sometimes, the company might have so 
much experience in the business that they feel no additional information can be 
obtained through research. As stated earlier in Unit 1, research is conducted when 
the decision maker wants to reduce some risk and uncertainity while taking a 
decision. 


Table 2.1 Converting Management Decision Problem 
into Research Problem 


DECISION PROBLEM RESEARCH PROBLEM* 
What should be done to increase the . What is the awareness and purchase 
consumers of organic food products in intention of health conscious 
the domestic market? consumers for organic food products? 
. What is the impact of shift duties on 
How to reduce turnover rates in the work exhaustion and turnover 
BPO sector? intentions of the BPO employees? 
What is the current investment in real 
Can the housing and real estate growth estate and housing? Can the demand in 
be accelerated? the sector be forecasted for the next six 
months? 


* This requires you to follow a sequence of steps as specified in Figure 2.3 


Thus, what we clearly see is that the management problem is a difficulty 
faced by the decision maker and by itself cannot be tested. To do this it must be 
stated in a form that can lend itself to a scientific enquiry. In case the decision 
maker is a business manager, the management research problem requires that we 
look for an answer to to the problem faced by the manager, as in the above 
example of how to reduce the turnover rate ina BPO company. This problem has 
to be translated to a simpler form of research question. And as said earlier, there 
can be more than one research problem that can help the manager in taking a 
decision. It depends on the researcher how he looks at it. For example, he may 
say that the research problem is: 


e What are the management policies in other BPO companies? 


e Why do the employees leave the company? What is the problem 
area? 


e Are the shift duties creating a problem of work family conflict which 
is why they leave? 
e How can the company work on employee engagement so that he 
stays with the company? 
Thus, as you can see we can have many questions. Finally, the research 
problem you think is likely to give the possible solution is the one you decide to 
take as your research problem. 


Check Your Progress 
1. The management decision problem must be reduced to which type of 
problem? 
2. Which type of relationships are tested under simple research problems? 


3. Which type of problem is faced by the decision maker at the start of the 
problem recognition process? 


2.4 PROBLEM IDENTIFICATION PROCESS 


The process of identifying the research problem involves the following steps: 
1. Management decision problem 


The entire process begins with the identification of the difficulty encountered by 
the business manager/researcher. The manager might decide to conduct the study 
himself or gives it to a researcher or a research agency. Thus this step requires that 
there must be absolute clarity about what is the purpose of getting a study done. 
When the work is to be done by an outsider it is very important that discussion is 
held with the business manager. 


2. Discussion with subject experts 


The next step involves getting the problem in the right perspective through 
discussions with industry and subject experts. These individuals are knowledgeable 
about the industry as well as the organization. They could be found both within 
and outside the company. The information on the current and future is obtained 
with the assistance of an interview. Thus, the researcher must have a predetermined 
set of questions related to the doubts experienced in problem formulation. It should 
be remembered that the purpose of the interview is simply to gain clarity on the 
problem area and not to arrive at any kind of conclusions or solutions to the 
problem. For example, for the organic food study,that is mentioned in Table 2.1 as 
a decision problem, the researcher might decide to go to food experts like doctors 
and dieticians to seek their opinion. This data should, in practice, be supported 
with secondary data in the form of theory as well as organizational facts. 
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3. Review of existing literature 


A literature review is a comprehensive collection of the information obtained from 
published and unpublished sources of data in the specific area of interest to the 
researcher. This may include journals, newspapers, magazines, reports, government 
publications, and also computerized databases. The advantage of the survey is 
that it provides different perspectives and methodologies to be used to investigate 
the problem, as well as identify possible variables that may be studied. Second, 
the survey might also show that our research problem has already been investigated 
and this might be useful in solving the decision dilemma. It also helps in narrowing 
the scope of the study into a research problem. 


Once the data has been collected, the researcher must write it down in his/ 
her own words and clearly show how this is linked to the research topic under 
study. The logical and theoretical framework developed on the basis of past studies 
should be able to provide the foundation for the problem statement. 


The reporting should cite the author and the year of the study clearly. There 
are several internationally accepted forms of citing references and quoting from 
published sources. The Publication Manual of the American Psychological 
Association (sixth edition, 2009) and the Chicago Manual of Style (seventeenth 
edition, 2017) are academically accepted as referencing styles in management. 


4. Organizational analysis 


Another significant source for deriving the research problem is the industry and 
organizational data. In case the researcher/investigator is the manager himself/ 
herself, the data might be easily available. This data needs to include the 
organizational demographics—origin and history of the firm; size, assets, nature of 
business, location and resources; management philosophy and policies as well as 
the detailed organizational structure, with the job descriptions. It is to be 
remembered here that the organizational data might not be always essential, for 
example in case of basic research, where the nature of study is not company 
specific but general. 


5. Qualitative survey 


Sometimes the expert interview, secondary data and organizational information 
might not be enough to define the problem. In sucha case, a small exploratory 
qualitative survey can be done to understand the reason for some . For example, 
a soap like Dove may be very good in terms of price and quality but very few 
people in the smaller towns buy it. When we do a secondary data analysis, or talk 
to experts there seems to be no problem. Then we do a quick round of interview 
with women who come to a kirana store to find out why Dove is not bought. And 
the women tell us that the same soap is used by the whole family, and husband and 


sons do not use Dove as they say this is a soap for women, which is the reason 
why dove is not bought by them. These surveys thus are done on small samples 
and might make use of focus group discussions or interviews with the respondent 
population to help uncover relevant and current issues which might have a significant 
bearing on the problem definition. 


In the organic food research, focused group discussions with young and old 
consumers revealed the level of awareness about organic food and consumer 
sentiments related to purchase of more expensive but a healthy food product. 


6. Management research problem 


Once the audit process of secondary review and interviews and survey is over, the 
researcher is ready to focus and define the issues of concern, that need to be 
investigated further, in the form of an unambiguous and clearly defined research 
problem. Here, it is important to remember that simply using the word ‘problem’ 
does not mean that there is something wrong that has to be corrected, it simply 
indicates the gaps in information or knowledge base available to the researcher. 
These might be the reason for his inability to take the correct decision. Second, 
identifying all possible dimensions of the problem might be a monumental and 
impossible task for the researcher. For example, the lack of sales of a newly 
launched product could be due to consumer perceptions about the product, 
ineffective supply chain, gaps in the distribution network, competitor offerings or 
advertising ineffectiveness. It is the researcher who has to identify and then refine 
the most probable cause of the problem and formalize it as the research problem. 
This would be achieved through the five preliminary investigative steps indicated 
above. Once done the research problem has to be clearly defined in terms of 
certain components This will be discussed in the next section. 


7. Theoretical foundation and model building 


Having identified and defined the variables under study, the next step is to try and 
form a theoretical framework. It can be best understood as a schema or network 
of the probable relationship between the identified variables. An advantage of the 
model is that it clearly shows the expected direction of the relationships between 
the concepts. There is also an indication of whether the relationship would be 
positive or negative. 


This step, however, is not mandatory as sometimes the objective of the 
research is to explore the probable variables that might explain the observed 
phenomena and the outcome of the study helps to finally develop a conceptual 
model. 


Given below is a predictive model for turnover intentions developed to 
explain the high rate of attrition amongst BPO professionals. Once validated, it is 
of course possible to test it in different contexts and differing respondent population. 


Research Problem and 


Formulation of the 


Research Hypotheses 


NOTES 


Self-Instructional 
Material 


21 


Research Problem and 
Formulation of the 
Research Hypotheses 


NOTES 


Self-Instructional 
22 Material 


The Turnover Intention Model 
The proposed model to predict turnover intention is specified as mentioned below: 
TI = f (WE, OC, A, MS, TWE) (1) 
Where, TI = Turnover intention 
WE = Work exhaustion 
OC = Organizational commitment 
A = Age 
MS = Marital status 
TWE = Total work experience 


The theoretical construct of work exhaustion is influenced by Perceived 
Workload (PWL), Fairness of Reward (FOR), Job Autonomy (JA) and Work 
Family Conflict (WFC) [Adapted from Ahuja, Chudoba and Kacmar, 2007]. This 
can be mathematically written as: 


WE = f (PWL, FOR, JA, WFC) 2) 


Similarly, Organizational Commitment depends upon Job Autonomy, Work- 
Family Conflict, Fairness of Reward and Work Exhaustion (WE) [Adapted from— 
Ahuja, Chudoba and Kacmar, 2007]. Therefore, this can be stated mathematically 
as 


OC = f (JA, WFC, FOR, WE) 3) 


The model is diagrammatically represented in Figure 2.2. 
Perceived Job Work Family Fairness of 
Workload Autonomy Conflict Reward 


Ze 


Work Organizational Total Work Marital Age 
Exhaustion Commitment Experience Status 


et 


Fig. 2.2 Proposed Model for Turnover Intention 


Turnover 
Intentions 


The formulated framework has been explained verbally as a verbal model. 
The flowchart of the relationship between variables has been demonstrated in 
graphical form as a graphical model and the same have been also reduced to 
three mathematical equations specifying the relationship between the same in the 
form of a mathematical model. What needs to be understood is that all three 
are representatives of the same framework. 


8. Statement of research objectives 


Next, the research question(s) that were formulated need to be broken down as 
tasks or objectives that need to be met in order to answer the research question. 


This section makes active use of verbs such as ‘to find out’, ‘to determine’, 
‘to establish’, and ‘to measure’ so as to spell out the objectives of the study. In 
certain cases, the main objectives of the study might need to be broken down into 
sub-objectives which clearly state the tasks to be accomplished. 


In the organic food research, the objectives and sub-objectives of the study 
were as follows: 


1. To study the existing organic market: 


e To categorize the organic products available in Delhi into grain, 
snacks, herbs, pickles, squashes and fruits and vegetables; 


e To estimate the demand pattern of various products for each of 
the above categories; 


e To understand the marketing strategies adopted by different 
players for promoting and propagating organic products. 


2. Consumer diagnostic research: 


e To study the existing consumer profile, i.e., perception and 
attitudes towards organic products and purchase and 
consumption patterns; 


e To study the potential customers in terms of consumer segments, 
level of awareness, perception and attitude towards health and 
organic products; 


3. Opinion survey: To assess the awareness and opinions of experts such 
as doctors, dieticians and chefs in order to understand organic 
consumption. 
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Figure 2.3 summarizes the problem identification process. 


| Management Decision Problem 


J 


Discussion with Review of Organization Qualitative 
subjects experts existing literature Analysis analysis 


Management Research Problem/Question 


| Research Framework/Analytical Model 
| Statement of Research Objectives 


| Formulation of Research Hypothesis 


Fig. 2.3 Problem Identification Process 


2.5 COMPONENTS OF THE RESEARCH 
PROBLEM 


To address the problems of clarity and focus, we need to understand the 
components of a well defined problem. These are: 


The unit of analysis 


The researcher must specify in the problem statement the individual(s) from whom 
the research information is to be collected and on whom the research results are 
applicable. This could be the entire organization, departments, groups or individuals. 


Research variables 


The research problem also requires identification of the key variables under study. 
A variable is any concept that varies and we can assign numerals or values. A 
variable may be dichotomous in nature, that is, it can possess only two values such 
as male-female or customer—non-customer. Values that can only fit into prescribed 
number of categories are continuous variables, for example, very important (1) to 
very unimportant (5). There are still others that possess an indefinite set, e.g., age, 
income and production data. 


Variables can be further classified into four categories, depending on the Research Problem and 
. ; ; Formulation of the 
role they play in the problem under consideration. These are Research Hypotheses 


e Independent variables 

e Dependent variables NOTES 
e Moderating variables 

e Extraneous variables 


e Independent variable: Any variable that can be stated as influencing or 
impacting the dependent variable is referred to as an independent variable 
(IV). More often than not, the task of the research study is to establish the 
relationship between the independent and the dependent variable(s). 


In the organic food study, the consumers’ attitude towards healthy 
lifestyle could impact their organic purchase intention. Thus, attitude becomes 
the independent and intention the dependent variable. Another researcher 
might want to assess the impact of job autonomy and role of stress on the 
organizational commitment of the employees; here job autonomy and role 
stress are independent variables 


e Dependent variable: The most important variable to be studied and 
analysed in research study is the effect-dependent variable (DV). The entire 
research process is involved in either describing this variable or investigating 
the probable causes of the observed effect. Thus, this in essence has to be 
a measurable variable. For example, in the organic food study, the 
consumer’s purchase intentions as well as sales of organic food products in 
the domestic market, could serve as the dependent variable. 


e Moderating variables: Moderating variables are the ones that have a 
strong effect on the relationship between the independent and dependent 
variables. These variables have to be considered in the expected pattern of 
relationship as they modify the direction as well as the magnitude of the 
independent—dependent association. In the organic food study, the strength 
of the relation between attitude and intention might be modified by the 
education and the income level of the buyer. Here, education and income 
are the moderating variables (MVs). 


There might be instances when confusion might arise between a moderating variable 
and an independent variable. Consider the following situation: 


Proposition 1: Turnover intention (DV) is an inverse function of organizational 
commitment (IV), especially for workers who have a higher job satisfaction level 
(MV). 


While another study might have the following proposition to test. 


Proposition 2: Tarnover intention (DV) is an inverse function of job satisfaction 
(IV), especially for workers who have a higher organizational commitment (MV). 
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Thus, the two propositions are studying the relation between the same three 
variables. However the decision to classify one as independent and the other as 
moderating depends on the research interest of the decision maker. 


Extraneous variables: Besides the moderating variables, there might still exist a 
number of extraneous variables (EVs) which could affect the defined relationship 
but might have been excluded from the study. These would most often account for 
the chance variations observed in the research investigation. They might not heavily 
impact the direction of the findings. However, in case the effect is substantial, the 
researcher might try to block their effect by using an experimental and a control 
group (This concept will be discussed later in Unit 3). 


At this stage, we can clearly distinguish between the different kinds of 
variables discussed above. An independent variable is the most important cause 
which can explain the variance in the dependent variable. The moderating variable 
is a contributing variable which might affect the relationship between the independent 
and the dependent variable. The extraneous variables are outside the domain of 
the study and yet may also affect the dependent variable. 


Check Your Progress 


4. Name some of the academically accepted referencing styles in management. 
5. State the advantage of the developing a theoretical framework. 
6. What is another name for causal variable? 


7. Which variable can affect the relationship between the independent and 
the dependent variable? 


2.6 FORMULATING THE RESEARCH 
HYPOTHESES 


The problem identification process ends in the hypotheses formulation stage. Any 
assumption that the researcher makes on the probable direction of the results that 
might be obtained on completion of the research process is termed as a hypothesis. 
Unlike the research problem that generally takes on a question form, the hypotheses 
are always in a sentence form. The statements thus made can then be empirically 
tested. Kerlinger (1986) defines a hypothesis as ‘...a conjectural statement of the 
relationship between two or more variables.’ According to Grinnell (1993), ‘A 
hypotheses is written in such a way that it can be proven or disproven by valid and 
reliable data—it is in order to obtain these data that we perform our study’. 


While designing any hypotheses, there are a few criteria that the researcher 
must fulfill. These are: 


e Ahypothesis must be formulated in simple, clear, and declarative form. 
A broad hypothesis might not be empirically testable. Thus, it might be 


advisable to make the hypothesis unidimensional, and to be testing only nee : Sie a 
. . . . ormulation of the 
one relationship between only two variables at a time. Research Hypotheses 


o Consumer liking for the electronic advertisement for the new diet drink 


will have positive impact on brand awareness of the drink. NOTES 
o High organizational commitment will lead to lower turnover intention. 
e A hypothesis must be measurable and quantifiable. 


e A hypothesis is a conjectural statement based on the existing literature 
and theories about the topic and not based on the gut feel of the 
researcher. 


e The validation of the hypothesis would necessarily involve testing the 
statistical significance of the hypothesized relation. 


2.6.1 Types of Research Hypotheses 


The formulated hypothesis could be of two types: 


Descriptive hypothesis: This is simply a statement about the magnitude, trend 
or behaviour of a population under study. Based on past records, the researcher 
makes some presumptions about the variable under study. For example: 


e Students from the pure science background score 90—95 per cent on a 
course on quantitative methods. 


e The current advertisement for the diet drink will have a 20-25 per cent 
recall rate. 


e The literacy rate in the city of Indore is 100 per cent. 


Relational hypothesis: These are the typical kind of hypotheses which state the 
expected relationship between two variables. While stating the relation if the 
researcher makes use of words such as increase, decrease, less than or more 
than, the hypothesis is stated to be directional or one-tailed hypothesis. 


For example, 
e Higher the likeability of the advertisement, higher is the recall rate. 


e Higher the work exhaustion experienced by the BPO professional, higher is 
the turnover intention of the person. 


However, sometimes the researcher might not have reasonable supportive 
data to hypothesize the expected direction of the relationship. In this case he or 
she would leave the hypothesis as non-directional or two-tailed. 


For example, 


e There is arelation between quality of working life and job satisfaction 
experienced by employees. 


e Ban on smoking has an impact on cigarette sales. 
e Anxiety is related to performance. 
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The hypotheses discussed in this section are in a verbal sentence form. In 
later sections, we will learn that it needs to be reduced to a statistical form for any 
data analysis to be done. The nature and formulation of the statistical hypotheses 
will be discussed in Unit 10. 


2.7 WRITING A RESEARCH PROPOSAL 


We have learnt that research always begins with a purpose. Either this is the 
researcher’s own pursuit, or it is carried out to address and answer a specific 
managerial question and arrive at a solution. This clear statement of purpose guides 
the research process and must be converted into a plan for the study. This 
framework or plan is termed as the research proposal. A research proposal is a 
formal document that presents the research objectives, design of achieving these 
objectives and the expected outcomes/deliverables of the study. 


This step is essential both for academic and corporate research, as it clearly 
establishes the research process to be followed to address the research questions. 
In a business or corporate setting, this step is often preceded by a PR (Proposal 
Request). Here the manager or the corporate spells out his decision problem and 
requests the potential suppliers of research to work out a research plan/proposal 
to address the stated issues. 


Another advantage ofa formal proposal is that sometimes the manager may 
not be able to clearly tell his problem or the researcher might not be able to 
understand and convert the decision into a workable research problem. The 
researcher lists the objectives of the study and then together with the manager, is 
able to review whether or not the listed objectives and direction of the study will 
be able to deliver output for arriving at a workable solution. 


For the researcher, the document provides an opportunity to identify any 
shortfalls in the logic or the assumption of the study. It also helps to monitor the 
methodical work being carried out to accomplish the project. 


2.7.1 Contents of a research proposal 


There is a broad framework that most proposals follow. In this section we will 
briefly discuss these steps. 


Executive summary 


This is a broad overview that gives the purpose and objective of the study. Ina 
short paragraph, the author gives a summary about the management problem/ 
academic concern. 


Background of the problem 


This is the detailed background of the management problem. It requires a sequential 
and systematic build-up to the research questions and also why the study should 


be done. The researcher has to be able to demonstrate that there could be a ta ae 
number of ways in which the management dilemma could be answered. For Research Hypotheses 
example, a pharmaceutical company develops a new hair growing solution and 

packages it in two different types of bottles. They want to know which one people 

will buy. The product testing could be done internally in the company, or the two NOTES 
sample bottles could be formulated and tested for their acceptability amongst likely 

consumers or retailers keeping the product; or the two types would be developed 

and test launched and tested for their sales potential. The researcher thus has to 

spell out all probabilities and then systematically and logically argue for the research 

study. This section has to be objective and written in simple language, avoiding 

any metaphors or idioms to dramatize the plan. The logical arguments should speak 

for themselves and be able to convince the reader of the need for the study in 

order to find probable solutions to the management dilemma. 


Problem statement and research objectives 


The clear definition of the problem broken down into specific objectives is the 
next step. This section is crisp and to the point. It begins by stating the main thrust 
area of the study. For example, in the above case, the problem statement could 
be: 


To test the acceptability of a spray or capped bottle dispenser for a new 
hair growing formulation. 


The basic objectives of this research would be to: 


e Determine the comparative preference of the two prototypes amongst 
customers of hair growing solutions. 


e To conduct a sample usage test of both the bottles with the identified 
population. 


e To assess the ease of use for the bottles amongst the respondents. 


e To prepare a comparative analysis of the advantages and problems 
associated with each bottle, on the basis of the sample usage test. 


e To prepare a detailed report on the basis of the findings. 


If the study is addressed towards testing some assumptions in the form of 
hypotheses, they have to be clearly stated in this section. 


Research design 


This is the working section of the proposal as it needs to indicate the logical and 
systematic approach intended to be followed in order to achieve the listed 
objectives. This would include specifying the population to be studied, the sampling 
process and plan, sample size and selection. It also details the information areas of 
the study and the probable sources of data, i.e., the data collection methods. In 
case the process has to include an instrument design, then the intended approach 
needs to be detailed here. A note of caution has to be given here: this is not a 
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simple statement of the sampling and data collection plan; it requires a clear and 
logical justification of using the techniques over the methods available for research. 


Scheduling the research 


The time-bound dissemination of the study with the major phases of the research 
has to be presented. This can be done using the CPM/GANTT/PERT charts. 
This gives a clear way for monitoring and managing the research task. It also has 
the additional benefit of providing the researcher with a means of spelling out the 
payment points linked to the delivered phase outputs. 


Results and outcomes of the research 


Here the clear terms of contract or expected outcomes of the study have to be 
spelt out. This is essential even if it is an academic research. The expected 
deliverables need to clearly demonstrate how the researcher intends to link the 
findings of the proposed study design to the stated research objectives. For 
example, in the pharmaceutical study, the expected deliverables are: 


e To identify the usage problems with each bottle type. 
e To recommend on the basis of the sample study on which bottle to use for 
packaging the liquid. 
Costing and budgeting the research 
In all instances of business research, both internal and external, an estimated cost 
of the study is required. 


In addition to these sections, academic research proposals require a section 
on review of related literature; this generally follows the ‘problem background’ 
section. Ifthe proposal is meant to establish the credentials of the research supplier, 
then detailed qualifications of the research team, including the research experience 
in the required or related area, help to aid in the selection of the research proposal. 


Sometimes, the research study requires an understanding of some technical 
terms or explanations of the constructs under study; in such cases the researcher 
needs to attach a glossary of terms in the appendix of the research proposal. 


The last section of the proposal is to state the complete details of the 
references used in the formulation of the research proposal. Thus the data source 
and address have to be attached with the formulated document. 


2.7.2 Types of Research Proposals 

Basically, the proposals formulated could be of three types: 
e Academic research proposals 
e Internal organizational proposal 


e External organizational proposals 


Academic research proposal 


The academic research proposal might be generated by students or academicians 
pursuing the study for fundamental academic research. These kind of studies need 
extensive search of past studies and data on the topic of study. An example is an 
academician wanting to explore the viability of different eco-friendly packaging 
options available to a manufacturer. 


Internal organizational proposal 


The internal organizational proposals are conducted within an organization and 
are submitted to the management for approval and funding. They are of a highly 
focused nature and are oriented towards solving immediate problems. For example, 
a pharmaceutical company, which has developed a new hair growing formulation 
wants to test whether to package the liquid in a spray type or capped dispenser. 
The solutions are time-driven and applicability is only for this product. These studies 
do not require extensive literature review but do require clearly stated research 
objectives, for the management to assess the nature of work required. 


External organizational proposals 


External organizational proposals have the base or origin within the company, but 
the scope and nature of the study requires a more structured and objective research. 
For example, if the above stated pharmaceutical company wishes to explore the 
herbal cosmetic market and wants market analysis and feasibility study conducted; 
the PR might be spelt out to solicit proposals to address the research question, 
and execute an outsourced research. 


Check Your Progress 


8. Name the hypotheses that talks about relation between two or more 
variables. 
9. What should the researcher do when he/she does not have a reasonable 
supportive data to hypothesize the expected direction of the relationship? 
10. What is the research proposal preceded by in a corporate or business 
setting? 
11. Mention the tools through which the time-bound dissemination of the study 
with the major phases of the research is presented. 


2.8 ANSWERS TO CHECK YOUR PROGRESS 
QUESTIONS 


1. The management decision problem must be reduced to a research problem 


which can lead to a scientific enquiry. 
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. Linear relationships are tested under simple research problems. 


3. The types of problem faced by the decision maker at the start of the problem 


recognition process are that it might be related to actual and immediate 
difficulties faced by the manager or gaps experienced in the existing body 
of knowledge. 


. Some of the academically accepted referencing styles in management are 


the Publication Manual of the American Psychological Association (2001) 
and the Chicago Manual of Style (1993) are academically accepted as 
referencing styles in management. 


. The advantage of developing a theoretical framework is that it clearly shows 


the expected direction of the relationships between the concepts. There is 
also an indication of whether the relationship would be positive or negative. 


6. Independent variable is another name for causal variable. 


7. Moderating variable is the contributing variable which may affect the 


10. 


11. 


2.9 


relationship between the independent and the dependent variable. 


. The hypotheses that talks about the relation between two or more variables 


is known as the relational hypotheses. 


. When the researcher does not have reasonable supportive data to hypothesize 


the expected direction of the relationship between variable that he/she should 
leave the hypotheses as non-directional or two-tailed. 


In a business or corporate setting, the research proposal is preceded by a 
PR (Proposal Request). Here the manager or the corporate spells out his 
decision problem and requests the potential suppliers of research to work 
out a research plan/proposal to address the stated issues. 


The time-bound dissemination of the study with the major phases of the 
research is presented using the CPM/GANTT/PERT charts. 


SUMMARY 


The most important step in research is to identify the decision to be made 
and how it can be converted into a research problem 


The problem definition process is a well-integrated, linked and stepwise 
process. 


There are some essential elements of a typical research problem. These 
include the unit of analysis—which is the individual or group that is to be 
studied. The second element is a clear definition of the variables under 
study. 


e At this stage, the researcher should be able to specify what is the causal or 


independent variable and which is the effect or dependent variable under 
study. Also, it is best to acknowledge the effect or presence of any external 


variables which might have a contingent effect on the cause and effect nee : Sie a 
. . . . . Ormutatlon o, e 
relationship that is to be studied. These can be further classified as moderator, Research Hypotheses 


intervening, and extraneous variables. 


e Itis advisable to the researcher to construct a model or theoretical framework 
based on the process of problem formulation. This is arecommended but 
not necessarily an essential step as some studies might be of a nature that 
the intent is to conduct the study and then arrive at a theory or a model. 


NOTES 


e The problem formulation process ultimately ends as a research hypothesis. 


e The entire step wise in the shape of a formal plan to be followed is made. 
This is called the research proposal 


e There are three different kinds of research proposals available to the 
researcher — academic, internal and external. 


2.10 KEY WORDS 


e Dependent variable: The outcome or effect that is being studied in the 
research 


e Extraneous variable: Any variable that may have an effect on the dependent 
variable and is not part of the study. 


e Hypothesis: Any pre-supposition made about the likely outcome of the 
study. 


e Independent variable: The variables that might have an effect on the 
dependent variable 


e Unit of analysis: The respondent population to be studied 


2.11 SELF ASSESSMENT QUESTIONS AND 
EXERCISES 


Short-Answer Questions 
1. How would you distinguish between a management decision problem and a 
management research problem? 


2. What is a research hypothesis? Do all researches require hypotheses 
formulation? 


3. What are the steps involved in writing a research proposal. Give examples. 
Long-Answer Questions 


1. Do all decision problems require research? Explain and illustrate with 
examples. 
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2. Explain the step wise process of problem identification with an example 


3. What are the components of a sound research problem? Illustrate with 
examples. 


4. Explain the different types of hypotheses available for research with 
examples. 


5. What are the different kinds of research proposals that can be formulated? 
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3.0 INTRODUCTION 


In the last unit, we studied the defining of the research problem and the formulation 
of the research hypothesis. However, in research, it is not enough to define the 
problem formulate the hypotheses. It has been found by research scholars and 
managers alike that most research studies do not result in any significant findings 
because of a faulty research design. Most researchers feel that once the problem 
is defined and hypotheses are made, one can go ahead and collect the data ona 
specified group, or sample, and then analyse it using statistical tests. However, 
unless the the formulated research problem and the study hypotheses is tested 
through a well defined plan, answers are going to be based on hit and trial rather 
than any sound logic. 


The design approach available to the researcher are many and will depend 
on whether the study is of descriptive or conclusive nature. The designs range 
from very simple, loosely structured to highly scientific experimentation. In this 
unit, we will study the complete choice of designs, along with detailed reasoning 
on which design should be used under what conditions. Just as experiments in 
science, in business research also there are chances of error and this needs to be 
understood and controlled for more accurate results for the decision maker. 
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3.1 OBJECTIVES 


After going through this unit, you will be able to: 
e Describe the nature of research designs 
e Explain exploratory research designs 
e Discuss the designs used for descriptive studies 
e Describe the range of experimental designs available 


e Identify and control the errors in research designs 


3.2 MEANING, NATURE AND CLASSIFICATION 
OF RESEARCH DESIGNS 


Once you have established the ‘what’ of the study, i.e., the research problem, the 
next step is the ‘how’ of the study, which specifies the method of achieving the 
research objectives. In other words, this is the research design. 


Green et al. (2008) defines research design as ‘the specification of methods 
and procedures for acquiring the information needed. It is the overall operational 
pattern or framework of the project that stipulates what information is to be 
collected from which sources by what procedures. If it is a good design, it will 
ensure that the information obtained is relevant to the research questions and that 
it was collected by objective and economical procedures.’ 


Thyer (1993) states that, ‘A traditional research design is a blueprint or 
detailed plan for how a research study is to be completed—operationalizing 
variables so they can be measured, selecting a sample of interest to study, collecting 
data to be used as a basis for testing hypotheses, and analysing the results.’ Sellitz 
et al. (1962) state that, ‘A research design is the arrangement of conditions for 
collection and analysis of data in a manner that aims to combine relevance to the 
research purpose with economy in procedure.’ 


One of the most comprehensive and holistic definition has been given by 
Kerlinger (1995). He refers to a research design as, ‘.....a plan, structure and 
strategy of investigation so conceived as to obtain answers to research questions 
or problems. The plan is the complete scheme or programme of the research. It 
includes an outline of what the investigator will do from writing the hypotheses and 
their operational implications to the final analysis of data.’ 


Thus, the formulated design must ensure three basic principles: 
(a) Convert the research question and the stated assumptions/hypotheses into 
variables that can be measured. 
(b) Specify the process to complete the above task. 


(c) Specify the ‘control mechanism(s)’ to follow so that the effect of other 
variables that could have an effect on the outcome of the study have been 
controlled. 


At this stage, one needs to understand the difference between research 
design and research method. While the design is the specific framework that has 
been created to seek answers to the research question, the research method is the 
technique to collect the information required to answer the research problem, 
given the created framework. Thus, research designs have a critical and directive 
role to play in the research process. The execution details of the research question 
to be investigated are referred to as the research design. 


The researcher has a number of designs available to him for investigating the 
research objectives. The classification that is universally followed is the one based 
upon the objective or the purpose of the study. A simple classification that is based 
upon the research needs ranging from simple and loosely structured to the specific 
and more formally structured. The best way is to view the designs on a continuum 
as shown in Figure 3.1. Hence, in case the research objective is diffused and 
requires a refinement, one uses the exploratory design, and this might lead to the 
slightly more concrete descriptive design—here one describes all the aspects of 
the construct and concepts under study. This leads to a more structured and 
controlled experimental research design. 


Figure 3.1 illustrates research designs as a continuous process. 
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Fig. 3.1 Research Designs—A Continuous Process 


In the following sections, you will study the broad classification of research design. 


3.3 EXPLORATORY RESEARCH DESIGNS 


Exploratory designs, as stated earlier, are the simplest and most loosely structured 
designs. As the name suggests, the basic objective of the study is to explore and 
obtain clarity about the problem situation. It is flexible in its approach and mostly 
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involves a qualitative investigation. The sample size is not strictly representative 
and at times it might only involve unstructured interviews with a couple of subject 
experts. The essential purpose of the study is to: 


e Define and understand the research problem to be investigated. 

e Explore and evaluate the diverse and multiple research opportunities. 
e Assist in the development and formulation of the research hypotheses. 
e Define the variables and constructs under study. 


e Identify the possible nature of relationships that might exist between the 
variables under study. 


e Explore the external factors and variables that might impact the research. 


For example, a university professor might decide to do an exploratory 
analysis of the new channels of distribution that are being used by the marketers to 
promote and sell products and services. To do this, a structured and defined 
methodology might not be essential as the basic objective is to understand how to 
teach this to students of marketing. The researcher can make use of different 
methods and techniques in an exploratory research- like secondary data sources, 
unstructured or structured observations, expert interviews and focus group 
discussions with the concerned respondent group. Here, we will discuss them in 
brief in the light of their use in exploratory research. 


3.3.1 Secondary Resource Analysis 


Secondary sources of data, as the name suggests, are data in terms of the details 
of previously collected findings in facts and figures—which have been authenticated 
and published. It is a fast and inexpensive way of collecting information. The past 
details can sometimes point out to the researcher that his proposed research is 
redundant and has already been established earlier. Secondly, the researcher might 
find that a small but significant aspect of the concept has not been addressed and 
should be studied. For example, a marketer might have extensively studied the 
potential of the different channels of communication for promoting a ‘home 
maintenance service’ in Greater Mumbai. However, there is no impact of any mix 
that he has tested. An anthropologist research associate, on going through the 
findings, postulated the need for studying the potential of WOM (word of mouth) 
in a close-knit and predominantly Parsi colony where this might be the most effective 
culture-dependent technique that would work. Thus, such insights might provide 
leads for carrying out an experimental and conclusive research subsequently. 


Another valuable secondary resource is the compiled and readily available 
databases of the entire industry, business or construct. These might be available 
on free and public domains or through a structured acquisition process and cost. 
These are both government and non-government publications. Based on the 
resources and the level of accuracy required, the researcher might decide to make 
use of them. 


3.2.2 Case Study Method 


Another way of conducting an exploratory research is the case study method. 
This requires an in-depth study and is focused on a single unit of analysis. This unit 
could be an employee or a customer; an organization or a complete country analysis. 
They are by their nature, generally, post-hoc studies and report those incidences 
which might have occurred earlier. The scenario 1s reproduced based upon the 
secondary information and a primary interview/discussion with those involved in 
the occurrence. Thus, there might be an element of bias as the data, in most cases, 
becomes a judgemental analysis rather than a simple recounting of events. 


For example, BCA Corporation wants to implement a performance appraisal 
system in the organization and is debating between the merits ofa traditional appraisal 
system and a 360° appraisal system. For a historical understanding of the two 
techniques, the HR director makes use of books on the subject. However, for 
better understanding, he should do an in-depth case accounting of Allied 
Association which had implemented traditional appraisal formats, and Surakhsha 
International which uses 360° appraisal systems. Thus, the two exploratory 
researches carried out were sufficient to arrive at a decision in terms of what 
would be best for the organization. 


3.3.3 Expert Opinion Survey 


At times, there might be a situation when the topic ofa research is such that there 
is no previous information available on it. In these cases, it is advisable to seek 
help from experts who might be able to provide some valuable insights based 
upon their experience in the field or with the concept. This approach of collecting 
particulars from significant and knowledgeable people is referred to as the expert 
opinion survey. This methodology might be formal and structured and is useful 
when authenticated or supported by a secondary/primary research or it might be 
fluid and unstructured and might require an in-depth interviewing of the expert. 
For example, the evaluation of the merit of marketing organic food products in the 
domestic Indian market cannot be done with the help of secondary data as no 
such structured data sources exist. In this case the following can be contacted: 


e Doctors and dieticians as experts would be able to provide information 
whether consumers would eat organic food products as a healthier 
alternative. 


e Chefs who are experimental and would like to look at providing better 
value to their clients. 


e Retailers who like to sell contemporary new products. 


These could be useful in measuring the viability of the proposed plan. 
Discussions with knowledgeable people may reveal some information regarding 
who might be considered as potential consumers. Secondly, the question whether 
a healthy proposition or a lifestyle proposition would work better to capture the 
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targeted consumers needs to be examined. Thus, this method can play a directional 
role in shaping the research study. 


3.3.4 Focus Group Discussions 


Another way to conduct a exploratory analysis is carry out discussions with 
individuals associated with the problem under study. This technique, though 
originally from sociology, is actively used in business research. In a typical focus 
group, there is a carefully selected small set of individuals representative of the 
larger respondent population under study. It is called a focus group as the selected 
members discuss the concerned topic for the duration of 90 minutes to, sometimes, 
two hours. Usually the group is made up of six to ten individuals. The number thus 
stated is because less than six would not be able to throw enough perspectives for 
the discussion and there might emerge a one-sided discussion on the topic. On the 
other hand, more than ten might lead to more confusion rather than any fruitful 
discussion and that would be unwieldy to manage. Generally, these discussions 
are carried out in neutral settings by a trained observer, also referred to as the 
moderator. The moderator, in most cases, does not participate in the discussion. 
His prime objective is to manage a relatively non-structured and informal discussion. 
He initiates the process and then maneuvers it to steer it only to the desired 
information needs. Sometimes, there is more than one observer to record the 
verbal and non-verbal content of the discussion. The conduction and recording of 
the dialogue requires considerable skill and behavioural understanding and the 
management of group dynamics. In the organic food product study, the focus 
group discussions were catried out with the typical consumers/buyers of grocery 
products. The objective was to establish the level of awareness about health hazards, 
environmental concerns and awareness of organic food products. A series of such 
focus group discussions carried out across four metros—Delhi, Mumbai, Bengaluru 
and Hyderabad—trevealed that even though the new age consumer was concerned 
about health, the awareness about organic products varied from extremely low to 
non-existent. (This study was carried out in the year 2004—05 by one of the 
authors for an NGO located in Delhi). 


Check Your Progress 


1. State the difference between the research design and research method. 
2. Which unit of analysis is the focus of the case study method? 


3. Define expert opinion survey. 


3.4 DESCRIPTIVE RESEARCH DESIGNS 


As the name implies, the objective of descriptive research studies is to provide a 
comprehensive and detailed explanation of the phenomena under study. The 


intended objective might be to give a detailed sketch or profile of the respondent 
population being studied. For example, to design an advertising and sales promotion 
campaign for high-end watches, a marketer would require a holistic profile of the 
population that buys such luxury products. Thus a descriptive study, (which 
generates data on who, what, when, where, why and how of luxury accessory 
brand purchase) would be the design necessary to fulfill the research objectives. 


Descriptive research thus are conclusive studies. However, they lack the 
precision and accuracy of experimental designs, yet it lends itself to a wide range 
of situations and is more frequently used in business research. Based on the time 
period of the collection of the research information, descriptive research is further 
subdivided into two categories: cross-sectional studies and longitudinal studies. 


3.4.1 Cross-sectional Studies 


As the name suggests, cross-sectional studies involve a slice of the population. 
Just as in scientific experiments one takes a cross-section of the leaf or the cheek 
cells to study the cell structure under the microscope, similarly one takes a current 
subdivision of the population and studies the nature of the relevant variables being 
investigated. 


There are two essential characteristics of cross-sectional studies: 


e The cross-sectional study is carried out at a single moment in time and thus 
the applicability is most relevant for a specific period. For example, one 
cross-sectional study was conducted in 2002 to study the attitude of 
Americans towards Asian-Americans, after the 9/11 terrorist attack. This 
revealed the mistrust towards Asians. Another cross-sectional study 
conducted in 2012 to study the attitude of Americans towards Asian- 
Americans revealed more acceptance and less mistrust. Thus the cross- 
sectional studies cannot be used interchangeably. . 


e Secondly, these studies are carried out on a section of respondents from 
the population units under study (e.g., organizational employees, voters, 
consumers, industry sectors). This sample is under consideration and under 
investigation only for the time coordinate of the study. 


There are also situations in which the population being studied is not of a 
homogeneous nature but composed of different groups. Thus it becomes essential 
to study the sub-segments independently. This variation of the design is termed as 
multiple cross-sectional studies. Usually this multi-sample analysis is carried out 
at the same moment in time. However, there might be instances when the data is 
obtained from different samples at different time intervals and then they are 
compared. Cohort analysis is the name given to such cross-sectional surveys 
conducted on different sample groups at different time intervals. Cohorts are 
essentially groups of people who share a time zone or have experienced an event 
that took place at a particular time period. For example, in the post-9/11 cross- 
sectional study done in 2002, we study and compare the attitudes of middle-aged 
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Americans versus teenaged Americans towards Asian-Americans. These two 
American groups are separate cohorts and this would be a cohort analysis. Thus 
the teenage American is one cohort and the middle-aged cohort is separate and 
thinks differently. 


The technique is especially useful in predicting election results, cohorts of 
males—females, different religious sects, urban—rural or region-wise cohorts are 
studied by leading opinion poll experts like Nielsen, Gallup and others. Thus, 
Cross-sectionals studies are extremely useful to study current patterns of behaviour 
or opinion. 


3.4.2 Longitudinal Studies 


A single sample of the identified population that is studied over a longer period of 
time is termed as a longitudinal study design. A panel of consumers specifically 
chosen to study their grocery purchase pattern is an example of a longitudinal 
design. There are certain distinguishing features of the same: 


e The study involves the selection of a representative panel, or a group of 
individuals that typically represent the population under study. 


e The second feature involves the repeated measurement of the group over 
fixed intervals of time. This measurement is specifically made for the variables 
under study. 


e A distinguishing and mandatory feature of the design is that once the sample 
is selected, it needs to stay constant over the period of the study. That 
means the number of panel members has to be the same. Thus, in case a 
panel member due to some reason leaves the panel, it is critical to replace 
him/her with a representative member from the population under study. 


Longitudinal study using the same section of respondents thus provides more 
accurate data than one using a series of different samples. These kinds of panels 
are defined as true panels and the ones using a different group every time are 
called omnibus panels. The advantages of a true panel are that it has a more 
committed sample group that is likely to tolerate extended or long data collecting 
sessions. Secondly, the profile information is a one-time task and need not be 
collected every time. Thus, a useful respondent time can be spent on collecting 
some research-specific information. 


However, the problem is getting a committed group of people for the entire 
study period. Secondly, there is an element of mortality and attrition where the 
members of the panel might leave midway and the replaced new recruits might be 
vastly different and could skew the results in an absolutely different direction. A 
third disadvantage is the highly structured study situation which might be responsible 
for a consistent and structured behaviour, which might not be the case in the real 
or field conditions. 


Check Your Progress 


4. Define cohort analysis. 


5. What are omnibus panels? 


3.5 EXPERIMENTAL DESIGNS 


Experimental designs are conducted to infer causality. In an experiment, a 
researcher actively manipulates one or more causal variables and measures their 
effects on the dependent variables of interest. Since any changes in the dependent 
variable may be caused by a number of other variables, the relationship between 
cause and effect often tends to be probabilistic in nature. It is virtually impossible 
to prove a causality. One can only infer a cause-and-effect relationship. 


The necessary conditions for making causal inferences are: (1) concomitant 
variation, (ii) time order of occurrence of variables and (iti) absence of other possible 
causal factors. The first condition implies that cause and effect variables should 
have a high correlation. The second condition means that causal variable must 
occur prior to or simultaneously with the effect variable. The third condition means 
that all other variable except the one whose influence we are trying to study should 
be absent or kept constant. 


There are two conditions that should be satisfied while conducting an experiment. 
These are: 


(i) Internal validity: Internal validity tries to examine whether the observed 
effect on a dependent variable is actually caused by the treatments 
(independent variables) in question. For an experiment to be possessing 
internal validity, all the other causal factors except the one whose influence 
is being examined should be absent. Control of extraneous variables is a 
necessary condition for inferring causality. Without internal validity, the 
experiment gets confounded. 


(it) External validity: External validity refers to the generalization of the results 
of an experiment. The concern is whether the result of an experiment can be 
generalized beyond the experimental situations. Ifit is possible to generalize 
the results, then to what population, settings, times, independent variables 
and the dependent variables can the results be projected. It is desired to 
have an experiment that is valid both internally and externally. However, in 
reality, aresearcher might have to make a trade-off between one type of 
validity for another. To remove the influence of an extraneous variable, a 
researcher may set up an experiment with artificial setting, thereby increasing 
its internal validity. However, in the process the external validity will be 
reduced. 
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Research Designs There are four types of experimental designs. These are explained below: 


1. Pre-experimental designs: There are three designs under this — one short 
case study where observation is taken after the application of treatment, 
one group pre test-post test design where one observation is taken prior to 
the application of treatment and the other one after the application of 
treatment, and static group comparison, where there are two groups — 
experimental group and control group. The experiment group is subjected 
to treatment and a post test measurement is taken. In the control group 
measurement is taken at the time when it was done for experimental group. 
These do not make use of any randomization procedures to control the 
extraneous variables. Therefore, the internal validity of such designs is 
questionable. 


NOTES 


2. Quasi-experimental designs: In these designs the researcher can control 
when measurements are taken and on whom they are taken. However, this 
design lacks complete control of scheduling of treatment and also lacks the 
ability to randomize test units’ exposure to treatments. As the experimental 
control is lacking, the possibility of getting confounded results is very high. 
Therefore, the researchers should be aware of what variables are not 
controlled and the effects of such variables should be incorporated into the 
findings. 

3. True experimental designs: In these designs, researchers can randomly 
assign test units and treatments to an experimental group. Here, the researcher 
is able to eliminate the effect of extraneous variables from both the 
experimental and control group. Randomization procedure allows the 
researcher the use of statistical techniques for analysing the experimental 
results. 


4. Statistical designs: These designs allow for statistical control and analysis 
of external variables. The main advantages of statistical design are the 
following: 

e The effect of more than one level of independent variable on the 
dependent variable can be manipulated. 


e The effect of more than one independent variable can be examined. 
e The effect of specific extraneous variable can be controlled. 
Statistial design includes the following designs: 


(i) Completely randomized design: This design is used when a 
researcher is investigating the effect of one independent variable on 
the dependent variable. The independent variable is required to be 
measured in nominal scale i.e. it should have a number of categories. 
Each of the categories of the independent variable is considered as 
the treatment. The basic assumption of this design is that there are no 
differences in the test units. All the test units are treated alike and 
randomly assigned to the test groups. This means that there are no 


extraneous variables that could influence the outcome. 
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Suppose we know that the sales of a product is influenced by the 
price level. In this case, sales are a dependent variable and the price is 
the independent variable. Let there be three levels of price, namely, 
low, medium and high. We wish to determine the most effective price 
level i.e. at which price level the sale is highest. Here, the test units are 
the stores which are randomly assigned to the three treatment level. 
The average sales for each price level is computed and examined to 
see whether there is any significant difference in the sale at various 
price levels. The statistical technique to test for such a difference is 
called analysis of variance (ANOVA). 


The main limitation of completely randomized designs is that it does 
not take into account the effect of extraneous variables on the dependent 
variable. The possible extraneous variables in the present example 
could be the size of the store, the competitor’s price and price of the 
substitute product in question. This design assumes that all the 
extraneous factors have the same influence on all the test units which 
may not be true in reality. This design is very simple and inexpensive 
to conduct. 


(it) Randomized block design: As discussed, the main limitation of the 


completely randomized design is that all extraneous variables were 
assumed to be constant over all the treatment groups. This may not 
be true. There may be extraneous variables influencing the dependent 
variable. In the randomized block design it is possible to separate the 
influence of one extraneous variable on a particular dependent variable, 
thereby providing a clear picture of the impact of treatment on test 
units. 

In the example considered in the completely randomized design, the 
price level (low, medium and high) was considered as an independent 
variable and all the test units (stores) were assumed to be more or less 
equal. However, all stores may not be of the same size and, therefore, 
can be classified as small, medium and large size stores. In this design, 
the extraneous variable, like the size of the store could be treated as 
different blocks. Now the treatments are randomly assigned to the 
blocks in such a way so that each treatment appears in each block at 
least once. The purpose of forming these blocks is that it is hoped that 
the scores of the test units within each block would be more or less 
homogeneous when the treatment is absent. What is assumed here is 
that block (size of the store) is correlated with the dependent variable 
(sales). It may be noted that blocking is done prior to the application 
of the treatment. 


In this experiment one might randomly assign 12 small-sized stores to 
three price levels in such a way that there are four stores for each of 
the three price levels. Similarly, 12 medium-sized stores and 
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12 large-sized stores may be randomly assigned to three price levels. 
Now the technique of analysis of variance could be employed to 
analyse the effect of treatment on the dependent variable and 
to separate out the influence of extraneous variable (size of store) 
from the experiment. 


(ii) Factorial design: A factorial design may be employed to measure 


the effect of two or more independent variables at various levels. The 
factorial designs allow for interaction between the variables. An 
interaction is said to take place when the simultaneous effect of two or 
more variables is different from the sum of their individual effects. An 
individual may have a high preference for mangoes and may also like 
ice-cream, which does not mean that he would like mango ice cream, 
leading to an interaction. 


The sales of a product may be influenced by two factors, namely, 
price level and store size. There may be three levels of price—low 
(A,), medium (A,) and high (A,). The store size could be categorized 
into small (B,) and big (B,). This could be conceptualized as a two- 
factor design with information reported in the form ofa table. In the 
table, each level of one factor may be presented as a row and each 
level of another variable would be presented as acolumn. This example 
could be summarized in the form ofa table having three rows and two 
columns. This would require 3 x 2 = 6 cells. Therefore, six different 
levels of treatment combinations would be produced each with a 
specific level of price and store size. The respondents would be 
randomly selected and randomly assigned to the six cells. 


The tabular presentation of 3 x 2 factorial design is given in 
Table 3.1. 


Table 3.1 3 x 2 Factorial Design for Price Level and Store Size 


Price Small (B1) | Big (B2) 


Low Level (A1) AiBi AıB2 


Medium Level (A2) AoBi A2B2 
High Level (A3) A;Bı A3B2 


Respondents in each cell receive a specified treatment combination. 
For example, respondents in the upper left hand corner cell would 
face small level of price and small store. Similarly, the respondents in 
the lower right hand corner cell will be subjected to both high price 
level and big store. 


The main advantages of factorial design are: 


e Itis possible to measure the main effects and interaction effect 
of two or more independent variables at various levels. 


e Itallowsa saving of time and effort because all observations are 
employed to study the effects of each factor. 


e The conclusion reached using factorial design has broader 
applications as each factor is studied with different combinations 
of other factors. 


The limitation of this design is that the number of combinations (number of 
cells) increases with increased number of factors and levels. However, a fractional 
factorial design could be used if interest is in studying only a few of the interactions 
or main effects. 


3.6 ERRORS AFFECTING RESEARCH DESIGN 


We have discussed three types of research designs, namely, exploratory, descriptive 
and experimental. All of these have some scope of error. There could be various 
sources of errors in research design. 


Exploratory research is conducted using focus group discussion, secondary 
data, analysis of case study and expert opinion survey. It is quite likely that members 
of the focus group have not been selected properly. Secondary data may not be 
free from errors (in fact, one needs to evaluate the methodology used in collecting 
such a data). Also, the experts chosen for the survey may not be experts in the 
field. As a matter of fact, getting an expert is very difficult task. All these factors 
could lead to errors in the exploratory design. 


In the descriptive design, the purpose is to describe a phenomenon. For this 
one could use a structured questionnaire. It could always a happen that the 
respondents do not give correct responses to some of the questions, thereby 
resulting in wrong information. 


In the true experimental design and statistical design, the respondents are 
selected at random which may not be the case in real life. Many a times, in actual 
business situation, the value judgements play very important role in selecting the 
respondents. Further, there can always be errors in observations. 


Check Your Progress 


6. Is it possible to prove a causality? 


7. Name the type of experimental design in which researchers can randomly 
assign test units and treatments to an experimental group. 


8. State the limitation of the factorial design. 
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ANSWERS TO CHECK YOUR PROGRESS 
QUESTIONS 


. The difference between research design and research methods is that while 


the design is the specific framework that has been created to seek answers 
to the research question, the research method is the technique to collect the 
information required to answer the research problem, give the created 
framework. 


. The case study method is focused on the single unit of analysis. 


3. An expert opinion survey is the approach of collecting particulars from 


3.8 


significant and knowledgeable people. 


. Cohort analysis is the name given to such cross-sectional surveys conducted 


on different sample groups at different time intervals. 


. Omnibus panels are the type of longitudinal study using a different group 


every time. 


. Itis virtually impossible to prove a causality. One can only infer a cause- 


and-effect relationship. 


. The type of experimental design in which researchers can randomly assign 


test units and treatments to an experimental group are true experimental 
designs. 


. The limitation of the factorial design is that the number of combinations 


increases with increased number of factors and levels. 


SUMMARY 


Research design is the blueprint or the framework for carrying out the 
research study. 


The researcher has a number of designs available to him for investigating 
the research objectives. Based upon the objective or the purpose of the 
study, research design may be exporatory, descriptive or experimental. 


Exploratory designs are loosely structured and investigative in nature. 


In case the hypothesis formulated is descriptive in nature, the study design 
would also be descriptive. The study involves collecting the who, what, 
why, where, why, when and how about the population under study. 


Descriptive studies can further be divided into cross-sectional, i.e., studying 
a section of the population at a single time period. In case the study is 


conducted on a single population, it is called as single cross-sectional and in Research Designs 
case, itis done on more than one segment it is called multiple cross-sectional 
designs. 


e Another type of descriptive desgn is the longitudinal design. Here, a selected 
sample is studied at different intervals (fixed) of time to measure the 
variable(s) under study. 


NOTES 


e Experimental designs are conducted to infer causality. There are four types 
of experimental designs — pre-experimental designs, quasi-experimental 
designs, true experimental designs and statistical designs 


3.9 KEY WORDS 


e Case study method: An in-depth study of a single unit of analysis. This 
could be an employee, the owner, a customer, a company or even a country. 


e Cross-sectional designs: A descriptive study done on a representative 
group of people at a single moment in time. 


Descriptive designs: Research designs that describe in detail the 
phenomena under study. 


Exploratory research design: Loosely structured research design to 

explore and gain clarity about the research questions. 

e Focus group discussion: A sociological method in which 6-10 people 
discuss the topic being researched. 

e Judgemental analysis: Formation of a judgement based upon personal 

impressions rather than facts. 


e Longitudinal designs: A single sample studied over a longer period of 
time. There are periodic measurements done of the study variable. 


Test unit: A unit on which treatment is applied. 


3.10 SELF ASSESSMENT QUESTIONS AND 
EXERCISES 


Short-Answer Questions 
1. How would you define research designs? What are the three principles to 
be taken care of when selecting a research design? 
2. Distinguish between internal and external validity of the experiments. 


3. What are the various sources of errors? 
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Long-Answer Questions 
1. What are exploratory designs? What are the methods that can be used in 
an exploratory design? 


2. What are descriptive designs? What are the different kinds of descriptive 
designs available? 


3. Explain the four types of experimental designs. 
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4.0 INTRODUCTION 


In the last unit, we discussed research design and its various aspects. Once the 
research design is in place, it is time to answer the research problem and hypotheses. 
But this cannot be done unless one collects the relevant information necessary for 
arriving at any suitable conclusions. The information thus collected is usually termed 
as data. The researcher has a choice of a wide variety of methods to collect the 
same. It has to be remembered that there might be a lot of information available on 
the topic under study; however you need to pick up only that information which is of 
direct relevance to the current problem under study. 


The researcher can make use of data that has been collected and compiled 
earlier or alternately make use of methods that are problem specific. The decision 
to choose one over the other or to use a combination of methods depends on a 
number of deciding criteria. This unit will begin by making the reader aware of the 
methods of data collection available for research. Next, we will discuss the 
secondary data methods and then go on to discuss three most widely used primary 
data methods: observation, focus group discussion and interviews. The most popular 
and widely used method of primary data is the questionnaire method. This will be 
dealt with at length in Unit 6. 


Primary and 
Secondary Data 


NOTES 


Self-Instructional 
Material 51 


Primary and 
Secondary Data 


52 


NOTES 


Self-Instructional 
Material 


4.1 OBJECTIVES 


After going through this unit, you will be able to: 


e Distinguish between different types of primary and secondary sources of 
data. 


e Explain the relevance of secondary data in research. 

e Identify the different types and sources of secondary data. 
e Describe the method and uses of observation method. 

e Discuss the method of focus group discussion. 


e Identify and use the interview method for data collection. 


4.2 CLASSIFICATION OF DATA 


To understand the number of choices available to a researcher for collecting the 
study-specific information, one needs to be fully aware of the resources available 
for the study and the level of accuracy required. To appreciate the truth of this 
statement, one needs to examine the variety of methods available to the researcher. 
The data sources could be either problem specific and primary or historical and 
secondary in nature (Figure 4.1). 


Data 
Sources 


-—.4oO@22-5 


Primary Secondary 
Methods Methods 


———— 


= 
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Fully Need Further ‘ Electronic Syndicated 


Fig. 4.1 Sources of Research Information 


External 


Primary data, as the name suggests, is original, problem- or project-specific and 
collected for the specific objectives and needs spelt out by the researcher. The 
accuracy and relevance is reasonably high. The time and money required for this 
are quite high and sometimes a researcher might not have the resources or the 
time or both to go ahead with this method. In this case, the researcher can look at 
alternative sources of data which are economical and reliable enough to take the 


study forward. These include the second category of data sources—namely the 
secondary data. 


Secondary data as the name implies is that information which is not topical or 
research-specific and has been collected and compiled by some other researcher 
or investigative body. This type of data is recorded and published in a structured 
format, and thus, is quicker to access and manage. Secondly, in most instances, 
unless it is a data product, it is not too expensive to collect. The information required 
is readily available as a data product or as the audit information which the researcher 
or the organization can get and use it for arriving at quick decisions. In comparison 
to the original research-centric data, secondary data can be economically and 
quickly collected by the decision maker in a short span of time. However, one 
must remember that this is a little low on accuracy as what is primary and original 
for one researcher would essentially become secondary and historical for someone 
else. 


Table 4.1 gives a snapshot of the major differences between the two methods. 
Table 4.1 Primary vs Secondary Data 


Primary Data Secondary Data 


Collection purpose | For the problem at hand | For other problems 


Collection process | Very involved Rapid & easy 


Collection cost High Relatively low 


Collection time Long Short 


4.3 SECONDARY DATA 


We have already discussed what secondary data is. Let us see what are its uses, 
types and sources. 


4.3.1 Uses of Secondary Data 


Secondary data can be used for multiple purposes as follows: 
e Problem identification and formulation stage: Existing information on 
the topic under study is useful to help develop the research question. 
e Hypotheses designing: Previous research studies done in the area could 
help in hypothesizing about expected results. 


e Sampling considerations: There might be respondent related databases 
available to seek respondent statistics and relevant contact details. These 
would help during sampling for the study. 
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e Primary base: The secondary information collected can be used to design 
the primary data collection instruments, in order to phrase and design the 
right questions. 


e Validation board: Earlier records and studies can also be used to support 
or validate the information collected through primary sources. 


Before we examine the wide range of the secondary sources available to 
the business researcher, it is essential that one is aware of the advantages and 
disadvantages of using secondary sources. 


4.3.2 Advantages and Disadvantages of Secondary Data 


There are multiple advantages of using secondary data. 


e Resource advantage: Any research that is making use of secondary 
information will be able to save immensely in terms of both cost and time 


e Accessibility of data: The other major advantage of secondary sources is 
that it is very easy to access this data. 


e Accuracy and stability of data: Data from recognized sources has the 
additional advantage of accuracy and reliability 


e Assessment of data: It can be used to compare and support the primary 
research findings of the present study. 


However, there is need for caution as well because in using secondary data, there 
might be some disadvantages like: 


e Applicability of data: The information might not be directly suitable for 
our study. Also since it is past data it might not be applicable today. 


e Accuracy of data: All data that is available might not be reliable and 
accurate. 


4.3.3 Types and Sources of Secondary Data 


As we saw earlier in Figure 4.1, secondary data can be divided into internal and 
external sources. Internal, as the name implies, is organization-or environment- 
specific source and includes the historical output and records available with the 
organization which might be the backdrop of the study. The data that is independent 
of the organization and covers the larger industry-scape would be available in the 
form of published material, computerized databases or data compiled by syndicated 
services. Discussed below are three major sources of data — internal, external, 
computer-stored data and syndicated databases. 


1. Internal sources of data 


Compilation of various kinds of information and data is mandatory for any 
organization that exists. Some sources of internal information are presented in 
Figure 4.2. 


Primary and 
Secondary Data 


Internal 
Data 


Company Employee Sales Financial Other 
Record Record Data Record Publications 


Fig. 4.2 Internal Sources of Data 


e Company records: This includes all the data about the inception, the 
owners, and the mission and vision statements, infrastructure and other 
details, including both the process and manufacturing (if any) and sales, as 
well as a historical timeline of the events. 


e Employee records: All details regarding the employees (regular and part- 
time) would be part of employee records. 


e Sales data: This data can take on different forms: 
(i) Cash register receipt 


(it) Salespersons’ call records: This is a document to be prepared and 
updated every day by each individual salesperson. 


(ii) Sales invoices: Customer who has placed an order with the company, 
his complete details including the size of the order, location, price by 
unit, terms of sale and shipment details (if any). 


e Financial records and sales reports 


Besides this, there are other published sources like warranty records, CRM 
data and customer grievance data which are extremely critical in evaluating 
the health of a product or an organization. 


2. External data sources 


As stated earlier, information that is collected and compiled by an outside source 
that is external to the organization is referred to as external source of data. External 
sources of data include the following: 


Published data: There could be two kinds of published data—one that is from 
the official and government sources and the other kind of data is that which has 
been prepared by individuals or private agencies or organizations. 


Government sources: The Indian government publishes a lot of documents that 
are readily available and are extremely useful for the purpose of providing 
background data. A brief snapshot of some government data is given in Table 4.2. 
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Primary and 
Secondary Data 
Census 
data 
NOTES conducted 


every ten 
years 
throughout 
the country 


2. | Statistical 
Abstract 
India — 
annually 


CSO (Central Statistical 
Organization) for the past 
5 years 
http://www.mospi.gov.in/ 
cso_test1.htm 


Size of the 
population and its 
distribution by 
age, sex, 
occupation and 
income levels. 
2010 census is 
taking many more 
variables to get a 
better picture of 
the population 


Education, health, 
residential 
information at the 
state level is part 
of this document 


Table 4.2 Secondary Data—Government Publications 


oeme] sawees [one | we 


Registrar General of India 
conducting census survey 
http://censusindia.gov.in/ 


Population 
information is 
significant as the 
forecasts of 
purchase, 
estimates of 
growth and 
development, as 
well as policy 
decisions can be 
made on this 
base 


Making demand, 
estimations and a 
state level 
assessment of 
government 
support and 
policy changes 
can be made 


3. | White 
paper on 
national 
income 


CSO 
http:/Awww.mospi.gov.in/ 
cso_test1.htm 


Estimates of 
national income, 
savings and 
consumption 


Significant 
indication of the 
financial trends; 
investment 
forecasts and 
monetary policy 
formulation 


. | Annual 
Survey of 
Industries — 
all 


industries 


CSO No. of units, persons 
employed, capital output 
ratio, turnover, etc. 
http://www.mospi.gov.in/cs 
o_test1.htm 


Information on 
existing units 
gives perspective 
on the Industrial 
development and 
helps in creating 
the employee 
profile 


5. | Monthly CSO Production Demand-supply 
survey of _ |http://www.mospi.gov.in/cs | statistics in detail |estimations 
selected o_test1.htm 
industries 

6. | Foreign Director General of Exports and Forecast, 

Trade of Commercial Intelligence Imports manufacturing 
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7. |Wholesale | Ministry of Commerce and 
price Industry 
index— http://india.gov.in/sectors/c 
weekly all | ommerce/ministry_commer 
India ce.php 
Consumer 
Price Index 


Reporting of 
prices of products 
like food articles, 
foodgrains, 
minerals, fuel, 
power, lights, 
lubricants, 
textiles, 
chemicals, metal, 
machinery and 
transport 


Establishing price 
bands of product 
categories; 
pricing 
estimations for 
new products; 
determining 
consumer spend 


8. |Economic |Dept. of Economic Affairs, 
Survey — Ministry of Finance, 
annual patterns, currency and 
publication | finance 

http://finmin.nic.in/the_mini 
stry/dept_eco_affairs/ 


Descriptive 
reporting of the 
current economic 
status 


Estimations of the 
future and 
evaluation of 
policy decisions 
and extraneous 
factors in that 
period 


9. | National Ministry of Planning 
Sample http://www. planningcommis 
Survey sion.gov.in/ 

(NSS) 


Social, economic, 
demographic, 
industrial and 
agricultural 
statistics. 


Significant for 
making policy 
decisions as well 
as studying 
sociological 
patterns 


Other data sources: This source is the most voluminous and most frequently 


used, in every research study. The information could be 


e Books and periodicals 
e Guides: including Industry guides 


e Directories and indices 


e Standard non-governmental statistical data: Some non-government data 


sources are presented in Table 4.3. 


Table 4.3 Secondary Data—Non-government Publications 


comprehensive 
details about 


Sub-type Sources Data Uses 

1. |Company |Bombay Stock Exchange A complete Significant in 
Working http://www.bseindia.com/ database ofthe | determining the 
Results — companies financial health of 
Stock registered with various sectors 
Exchange the stock as well as 
Directory exchange and assessment of 


corporate funding 
and predictions 


stock policies and | of outcomes 
current share 
prices 
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ae 2. | Status The commodity board or the Detailed These are useful 
reports by |industry associations like information on for individual 
various Jute Board, Cotton Industry, | current assets-in | sectors in 
commodity | Sugar Association, Pulses | terms of units, working out their 
NOTES boards Board, Metal Board, current plans as well as 


Chemicals, Spices, production figures | evaluating 
Fertilizers, Coir, Pesticides, |and market causes of 
Rubber, Handicrafts, condition success or failure 
Plantation Boards, etc. 


3. | Industry FICCI, ASSOCHAM, AIMA, | Cases/ 
Associatio | Association of Chartered comprehensive _ | the gaps and 
ns on Accountants and Financial | reports by the problems in the 
problems | Analysts, Indo-American supplier or user | effective 
faced by |Chamber of Commerce, or any other functioning of the 
organization; 
trouble shooting 


Cognizance of 


private etc. section 


sector, etc. | http://www.ficci.com/ associated with 
http:/www.assocham.org/ | the sector 
http://www. aima-ind.org/ 
www.iaccindia.com/ 


4. | Export Leather Exports Promotion | Product and To estimate the 
related Council, Apparel Export country wise data | demand; gauge 


data — Promotion Council, on the export opportunities for 
commodity | Handicrafts, Spices Tea, figures as well as | trade and 
wise etc., Exim Bank etc. information on impetus required 


http://www.leatherindia.org/ | existing policies |in terms of 
http://www.aepcindia.com/ |related to the manufacturing 
sector and policy 
changes 


5. | Retail ORG (Operations Research | The touch point | Market analysis 
Store Group); Monthly reports on | for this data is and market 
Audit on urban sector. Quarterly retailer, who structure 
pharmace- | reports on rural sector provides the mapping with 


and covers most 
brands. The data 
is region specific 
and covers both 
inventory and 
goods sold 


utical, figures related to | estimations of 
veterinary, product sales; the | market share of 
consumer data is very leading brands. 
products comprehensive |The audit can 


also be used to 
study 
consumption 
trends at different 
time periods or 
subsequent to 
sales promotion 
or other activities 
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6. | National 
Reader- 
ship 
Surveys 
(NRS) 


7. | THOMPS 
ON 
INDICES: 
Urban 
market 
index, 
Rural 
market 
index 


IMRB-survey of reading 
behaviour for different 
segments as well as 
different products 
http://www.imrbint.com/ 


Hindustan Thompson 
Associates 


Today these 
surveys are done 
by various bodies 
with different 
sample bases. 
Today the survey 
base has become 
younger, with the 
age of the reader 
lowered to 12+ 


All towns with 
population of 
more than one 
lakh are covered 
and information 
of demographic 
and socio- 
economic 
variables are 
given for each 
city with Mumbai 
as base. The 
rural index 
similarly covers 
about 400 
districts with 
socio-economic 
indicators like 
value of 
agriculture 
output, etc. 


Primary and 
Media planning Secondary Data 
and measuring 
exposure as well 
as reach for 
product NOTES 


categories 


The inclinations 
to purchase 
consumer 
products are 
directly related to 
socio-economic 
development of 
communities in 
general. The 
indices provide 
barometers to 
measure such 
potentials for 
each city and has 
implications for 
the researcher in 
terms of data 
collection 
sources 


3. Computer-stored data 


Information today is also available in an electronic form. The databases available 
to the researcher can be classified on the basis of the type of information or by the 
method of storage and recovery as described below. Figure 4.3 gives a classification 
of the sources of computerized data. 


e Reference databases: These refer users to the articles, research papers, 
abstracts and other printed news contained in other sources. They provide 
online indices and abstracts and are thus also called bibliographic databases. 


e Source databases: These provide numerical data, complete text, or a 
combination of both. 


Based on storage and recovery mechanisms: Another useful way of classifying 
databases is based on their method of storage and retrieval. 


e Online databases: These can be accessed in real time directly from the 
producers of the database or through a vendor. Examples include ABI/ 
Inform, EBSCO and Emerald. 


e CD-ROM databases: Here information is available on a CD-ROM. 
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Computer Based 
Information 
Storage and Recovery Information 
of Information Type 
On-line CD-ROM/Pen 
nemet Direct from Direct from Through other 
Suppliers Creator Networks 


Fig. 4.3 Classifications of Computerized Databases 


4. Syndicated data sources 


Syndicated service agencies are organizations that collect organization/product- 
category-specific data from a regular consumer base and create a common pool 
of data that can be used by multiple buyers, for their individual purpose. 


There are different ways to classify syndicate sources. 


e Household/individual data: These could be in the form of surveys or 
panel data available through reputed agencies. 


e Surveys: Surveys are usually one-time assessments conducted on a large 
representative respondent base. Like opinion polls before elections, best 
business school to study. 


e Product purchase panels: These specially selected respondent groups 
specifically record certain identified purchases, generally related to household 
products and groceries. 


e Media-specific panels: Panels are also created for collecting information 
related to promotion and advertising. The task of the media panel is to 
make use of different kinds of electronic equipment to automatically record 
consumer viewing behaviour. These are used to calculate the television rating 
performance (TRP) of different programs. 


e Scanner devices and individual source systems: To overcome the 
problems of panel data, a new service is provided by research agencies 
through electronic scanner devices-e.g. sales volume tracking data. 


e Institutional syndicated data: The syndicated data can also be available Primary and 


at the institutional level. Retailer and wholesaler audits are examples of this iene 
kind. Usually the records are noted as: 
Beginning stocks + deliveries — ending inventory = sales for the period NOTES 


Check Your Progress 


1. Which type of data has a significant time and cost advantage? 


2. How is secondary information collected helpful in for primary data collection 
instruments? 


3. Cash registers are examples of which type of secondary data sources? 


4. What are syndicated service agencies? 


4.4 PRIMARY DATA COLLECTION: 
OBSERVATION METHOD 


The researcher has available to him/her a wide variety of data collection methods 
which are primary or problem specific in nature. However in this unit we would be 
discussing the major and most often used methods like the observation method, 
focus group discussion and interview method. The questionnaire method is the 
most commonly used method of primary data collection. We will focus on the 
questionnaire method in detail in Unit 6. Let us discuss some of the other widely 
used methods now. 


Observation is a direct method of collecting primary data. It is one of the 
most appropriate methods to use in case of descriptive research. The method of 
observation involves viewing and recording individuals, groups, organizations or 
events in a scientific manner in order to collect valuable data related to the topic 
under study. 


The mode of observation could be in a standardized or structured 
observation. Here, the nature of content to be recorded, the format and the broad 
areas of recording are predetermined. Thus, the observer’s bias is reduced and 
the authenticity and reliability of the information collected is higher. For example, 
Fisher Price toys carry out an observational study whenever they come out with a 
new toy. The observer is supposed to record the appeal of the toy fora child. 


The opposite of this is called the unstructured observation. Here, the 
observer is supposed to make a note of whatever he understands as relevant for 
the research study. This kind of approach is more useful in exploratory studies 
Since it lacks structure, the chances of observer’s bias are high. An example of 
this is the observation of consumers at a bank, a restaurant or a doctor’s clinic. 
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However, it is critical here to understand that the researcher must have a 
preconceived plan to capture the observations made. It is not to be treated as a 
blank sheet where the observer reports what he sees. The aspects to be observed 
must be clearly listed as in an audit form, or they could be indicative areas on 
which the observation is to be made. 


Another way of distinguishing observations is the level of respondent being aware 
of being observed or not. This might be disguised; here the observation is done without 
the respondent’s knowledge who has no idea that he/she is being observed. This can 
also be done with devices like a one-way mirror or a hidden camera or a recorder. 
The only disadvantage is this is ethically an intrusion ofan individual’s right to privacy. 
On the other hand, the knowledge that the person is under observation can be conveyed 
to the respondent, and this is undisguised observation. The decision to choose one 
over the other depends upon the nature of the study. 


The observation method can also be distinguished on the basis of the setting 
in which the information is being collected. This could be natural observation, 
which as the name suggests, is carried out in actual real life locations, for example 
the observations of how employees interact with each other during lunch breaks. 
On the other hand, it could be an artificial or simulated environment in which the 
respondent is to be observed. This is actively done in the armed forces where 
stress tests are carried out to measure an individual’s tolerance level. 


There is another differentiation where the observation could be done bya 
human observer or a mechanical device. 


Human observation: As the name suggests, this technique involves observation 
and recording done by human observers. The task of the observer is simple and 
predefined in case of a structured observation study as the format and the areas to 
be observed and recorded are clearly defined. In an unstructured observation, the 
observer records in a narrative form the entire event that he has observed. 


Mechanical observation: In these methods, man is replaced by machine. Some 
examples are 


e Store cameras and cameras in banks and other service areas. 
e Universal product code (UPC) scanned by electric scanners in stores. 


e Psychogalvanometer, which measures galvanic skin response (GSR) or 
changes in the electrical resistance of the skin. Thus, the respondent could 
be exposed to different kinds of packaging, advertisements and product 
composition, to note his/her reaction to them. 


e Eye-tracking equipment such as oculometers, eye cameras or eye view 
minuters, record the movements of the eye. The oculometer determines 
what the individual is looking at, while the pupilometer measures the interest 
of the person in the stimulus. The pupilometer measures changes in the 
diameter of the respondent’s pupils. 


e Trace analysis; in this the remains or the leftovers of the consumers’ basket— 
like his credit card spend, his recycle bin on his computer, his garbage 


(garbology) are evaluated to measure current trends and patterns of usage Primary and 


and disposal. Reena gees 
Observational techniques are an extremely useful method of primary data 
collection and are always a part of the inputs, whether accompanying other NOTES 


techniques, like interviews, discussions or questionnaire administration, or as the 
prime method of data collection. However, the disadvantage which they suffer 
from is that they are always behaviourally driven and cannot be used to investigate 
the reasons or causes of the observed behaviour. Another problem is that if one is 
observing the occurrence of a certain phenomena, one has to wait for the event to 
occur. One alternative to this is to study the recordings, whether verbal, written or 
audio-visual, in order to formulate the study-related inferences. 


4.5 PRIMARY DATA COLLECTION: FOCUS 
GROUP DISCUSSION 


Focus group discussion (FGD) is a highly versatile and dynamic method of collecting 
primary data from a representative group of respondents. The process generally 
involves a moderator who steers the discussion on the topic under study. There 
are a group of carefully selected respondents who are specifically invited and 
gathered at a neutral setting. The moderator initiates the discussion and then the 
group carries it forward by holding a focused and an interactive discussion. 


Key elements of a focus group 


Size: Ideal recommended size for a group discussion is 8 to 12 members. Less 
than eight would not generate all the possible perspectives on the topic and the 
group dynamics required for a meaningful session. And more than 12 would make 
it difficult to get any meaningful insight. 


e Nature: Individuals who are from a similar background—in terms of 
demographic and psychographic traits—must be included, otherwise 
disagreement might emerge as a result of other factors rather than the one 
under study. The other requirement is that the respondents must be similar 
in terms of the subject/policy/product knowledge and experience with the 
product under study. Moreover, the conduction of the focus group discussion 
must ensure that the following criteria are taken care of: 


e Acquaintance: It has been found that knowing each other in a group 
discussion is disruptive and hampers the free flow of the discussion. It is 
recommended that the group should consist of strangers rather than subjects 
who know each other. 


Setting: The space or setting in which the discussion takes place should be 
as neutral, informal and comfortable as possible. In case one-way mirrors 
or cameras are installed, there is a need to ensure that these gadgets are not 
directly visible. 
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Primary and e Time period: The discussion should be held in a single setting unless there 
Secondary Data ne $ è $ : . . p ae 
is a ‘before’ and ‘after’ design, which requires group perceptions, initially 
before the study variable is introduced; and later in order to gauge the 
group’s reactions. The ideal duration of conduction should not exceed an 
NOTES hour and a half. This is usually preceded by a short rapport formation session 
between the moderator and the group members. 


e Therecording: This is most often machine recording even though sometimes 
this may be accompanied by human recording as well. 


e The moderator: The moderator is the one who manages the discussion. He 
might be a participant in the group discussion or he might be a non-participant. 
He must bea good listener and unbiased in his conduct of the discussions. 


Steps in planning and conducting focus groups 
The focus group conduction has to be done in a stepwise manner: 


e Clearly define and enlist the research objectives of the study that requires 
group discussion. 


e Acomprehensive moderator’s structured outline for conducting the whole 
process needs to be charted out. 


e After this, the actual focus group discussion is carried out. 


e The focus summary of the findings are clubbed under different heads as 
indicated in the focus group objectives and reported in a narrative form. 
This may include expressions like ‘majority of the participants were of the 
view’ or ‘there was a considerable disagreement on this issue’. 


Types of focus groups 


The researcher has different kinds of group discussion methods available to him 
or her. These are: 


e Two-way focus group: Here one respondent group sits and listens to the 
other and after learning from them or understanding the needs of the group, 
they carry out a discussion amongst themselves. For example, in a 
management school, the faculty group could listen to the opinions and needs 
of the student group. 


e Dual-moderator group: Here, there are two different moderators; one 
responsible for the overt task of managing the group discussion and the 
other for the second objective of managing the ‘group mind’ in order to 
maximize the group performance. 


Fencing-moderator group: The two moderators take opposite sides on 
the topic being discussed and thus, in the short time available, ensure that all 
possible perspectives are thoroughly explored. 


Friendship groups: There are situations where the comfort level of the 
members needs to be high so that they elicit meaningful responses. This is 
especially the case when a supportive peer group encourages admission 


about the related organizations or people/issues. 
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e Mini-groups: These groups might be ofa smaller size (usually four to six) Primary and 


. f dary D. 
and are usually expert groups/committees that on account of their AE S 
composition are able to decisively contribute to the topic under study. 

e Creativity group: These are usually longer than one and ahalfhour duration koris 


and might take the workshop mode. Here, the entire group is instructed, after 
which they brainstorm in smaller sub-groups. They then reassemble to present 
their sub-group’s opinion. This might also stretch across a day or two. 

e Brand-obsessive group: These are special respondent sub-strata who 
are passionately involved with a brand or product category (say, cars). 
They are selected, as they can provide valuable insights that can be 
successfully incorporated into the brand’s marketing strategy. 

e Online focus group: This is a recent addition to the methodology and is 
extensively used today. Here, the respondents at the designated time in a 
web-based chat room and enters their ID and password to log on. The 
discussion between the moderator and the participants is real time. 


4.6 PRIMARY DATA COLLECTION: PERSONAL 
INTERVIEW METHOD 


Personal interview is a one-to-one interaction between the investigator/interviewer 
and the interviewee. The purpose of the dialogue is research specific and ranges 
from completely unstructured to highly structured. 

Uses of the interview method 

The interview has varied applications in business research and can be used 
effectively in various stages. 

e Problem definition: The interview method can be used right in the beginning 
of the study. Here, the researcher uses the method to get a better clarity 
about the topic under study. 

e Exploratory research: Here because the structure is loose this method 
can be actively used. 

e Primary data collection: There are situations when the method is used as 
a primary method of data collection, this is generally the case when the area 
to be investigated is high on emotional responses 


The interview process 


The steps undertaken for the conduction of a personal interview are somewhat 
similar in nature to those of a focus group discussion. 

Interview objective: The information needs that are to be addressed by the 
instrument should be clearly spelt out as study objectives. This step includes a 
clear definition of the construct/variable(s) to be studied. 

Interview guidelines: A typical interview may take from 20 minutes to close to 
an hour. A brief outline to be used by the investigator is formulated depending 
upon the contours of the interview. 
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Structure: Based on the needs of the study, the actual interview may be 
unstructured, semi-structured or structured. 


e Unstructured: This type of interview has no defined guidelines. It usually 
begins with a casually worded opening remark like ‘so tell us/me something 
about yourself’. The direction the interview will take is not known to the 
researcher also. The probability of subjectivity is very high. 


e Semi-structured: This has a more defined format and usually only the broad 
areas to be investigated are formulated. The questions, sequence and 
language are left to the investigator’s choice. Probing is of critical importance 
in obtaining meaningful responses and uncovering hidden issues. After asking 
the initial question, the direction of the interview is determined by the 
respondent’s initial reply, the interviewer’s probes for elaboration and the 
respondent’s answers. 


e Structured: This format has highest reliability and validity. There is 
considerable structure to the questions and the questioning is also done on 
the basis of a prescribed sequence. They are sometimes used as the primary 
data collection instrument also. 


Interviewing skills: The quality of the output and the depth of information 
collected depend upon the probing and listening skills of the interviewer. His attitude 
needs to be as objective as possible. 


Analysis and interpretation: The information collected is not subjected to any 
statistical analysis. Mostly the data is in narrative form, in the case of structured 
interviews it might be summarized in prose form. 


Types of interviews 


There are various kinds of interview methods available to the researcher. In the 
last section we have spoken about a distinction based on the level of structure. 
The other classification is based on the mode of administering the interview. 


Figure 4.3 presents a classification of the types of personal interview. 


Interview 
Methods 
Telephone Personal 
Interviewing Interviewing 
ree Computer- Mall Computer- 


Fig. 4.3 Types of Personal Interview 


Personal methods: These are the traditional one-to-one methods that have been Primary and 


. : ; : Aa : : Secondary Dat 
used actively in all branches of social sciences. However, they are distinguished in H 
terms ofthe place of conduction. 
e At-home interviews: This face-to-face interaction takes place at the NOTES 


respondent’s residence. Thus, the interviewer needs to initially contact the 
respondent to ascertain the interview time. 


e Mall-intercept interviews: As the name suggests, this method involves 
conducting interviews with the respondents as they are shopping in malls. 
Sometimes, product testing or product reactions can be carried out through 
structured methods and followed by 20-30 minute interviews to test the 
reactions. 

e Computer-assisted personal interviewing (CAPI): This technique is carried 
out with the help of the computer. In this form of interviewing, the respondent 
faces an assigned computer terminal and answers a questionnaire on the 
computer screen by using the keyboard or a mouse. A number of pre-designed 
packages are available to help the researcher design simple questions that 
are self-explanatory and instead of probing, the respondent is guided to a 
set of questions depending on the answer given. There is usually an interviewer 
present at the time of respondent’s computer-assisted interview and is 
available for help and guidance, if required. 

Telephone method: The telephone method replaces the face-to-face interaction 
between the interviewer and interviewee, by calling up the subjects and asking 
them a set of questions. The advantage of the method is that geographic 
boundaries are not a constraint and the interview can be conducted at the individual 
respondent’s location. The format and sequencing of the questions remains the 
same. 


e Traditional telephone interviews: The process can be accomplished using 
the traditional telephone for conducting the questioning. 


e Computer-assisted telephone interviewing: In this process, the interviewer 
is replaced by the computer and it involves conducting the telephonic 
interview using a computerized interview format. The interviewer sits in front 
of a computer terminal and wears a mini-headset, in order to hear the 
respondent answer. However, unlike the traditional method where he had 
to manually record the responses, the responses are simultaneously recorded 
on the computer. 


Since the interview requires a one-to-one dialogue to be carried out, it is 
more cumbersome and costly as compared to a focus group discussion. Also, 
conduction of interview requires considerable skills on part of the interviewer and 
thus adequate training in interviewing skills is needed for capturing a comprehensive 
study-related data. 
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Questionnaire Method 


It is one of the most cost-effective methods which can be used with considerable 
ease by most individual and business researchers. It has the advantage of flexibility 
of approach and can be successfully adapted for most research studies. The 
instrument has been defined differently by various researchers. Some take the 
traditional view ofa written document requiring the subject to record his/her own 
responses (Kervin, 1999), others have taken a broader perspective to include 
structured interview also as a questionnaire (Bell, 1999). It is essentially a data- 
collection instrument that has a pre-designed set of questions, following a particular 
structure (De Vaus, 2002). Since it includes a standard set of questions, it can be 
successfully used to collect information from a large sample in a reasonably short 
time period. 

However, a note of caution is to be sounded here, as the usage of 
questionnaire as the best method in all research studies is not a foregone 
conclusion. For example, at the exploratory stage, when one is still trying to 
identify the information areas, variables and execution decision, it is advisable 
to use a more unstructured interview. Secondly, when the number of 
respondents is small and one needs to collect more subjective data and most 
of the questions to be asked are open-ended, then a standardized questionnaire 
is not advisable. 


When one is designing the questionnaire, there are certain criteria that must 
be kept in mind. 


Criteria for Questionnaire Designing 


The first and foremost requirement is that the spelt-out research objectives must 
be converted into clear questions which will extract answers from the respondent. 
This is not as easy as it sounds, for example, if one wants to know something like 
what is the margin that a company gives to the retailer? This cannot be converted 
into a direct question as no one will give the correct figure. Thus, one will have to 
ask a disguised question like may be a range of percentage estimates—2-—S per 
cent, 6-10 per cent, 11—15 per cent, 16—20 per cent, etc., or the retailer might 
not go beyond a yes, no or ‘ industry standard’. 


The second requirement is, like the Toyota questionnaire, it should be 
designed to engage the respondent and encourage a meaningful response. For 
example, a questionnaire measuring stress cannot have a voluminous set of 
questions which fatigue the subject. The questions, thus, should be non- 
threatening, must encourage response and be clear to understand. One needs to 
remember that the essential usage of the instrument is to administer the same to 
a large base, thus there must be clarity and interest that should be part of the 
measure itself. 


Lastly, the questions should be self-explanatory and not confusing as then Primary and 
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the answers one gets might not be accurate or usable for analysis. This will be 
discussed in detail later, when we discuss the wording of the questions. 
Types of questionnaire NOTES 


There are many different types of questionnaire available to the researcher. The 
categorization can be done on the basis of a variety of parameters. The two which 
are most frequently used for designing purposes are the degree of construction or 
structure and the degree of concealment, of the research objectives. Construction 
or formalization refers to the degree to which the response category has been 
defined. Concealed refers to the degree to which the purpose of the study is 
explained or is clear to the respondent. 


Instead of considering them as individual types, most research studies use a 
mixed format. Thus, they will be discussed here as a two-by-two matrix (Table 
4.4). 

Table 4.4 Types of Questionnaire 


FORMALIZED NON-FORMALIZED 


Most research studies use The response categories 
UNCONCEALED ë questionnaires like these have more flexibility 


Used for assessing psychographic 


Questionnaires using projective 
and subjective constructs 


CONCEALED 


techniques or sociometric analysis 


Formalized and unconcealed questionnaire: This is the one that is 
indiscriminately and most frequently used by all management researchers. For 
example, ifa new brokerage firm wants to understand the investment behaviour of 
the population under study, they would structure the questions and answers as 
follows: 
1. Do you carry out any investment(s)? 
Yes No 


If yes, continue, else terminate. 


2. Out of the following options, where do you invest (tick all that apply). 


Precious metals , real estate , stocks ; 
government instruments , mutual funds , 
any other 


3. Who carries out your investments? 


Myself , agent , relative , friend 


any other 
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In case the option ticked is self, please go to Q. 4, else skip. 
4. What is your source of information for these decisions? 


Newspaper , investment magazines , company 
records, etc. , trading portals „agent 


This kind of structured questionnaire is easy to administer, as one can see 
that the questions are self-explanatory and, since the answer categories are defined 
as well, the respondent needs to read and tick the right answer. Another advantage 
with this form is that it can be administered effectively to a large number of people 
at the same time. Data tabulation and data analysis is also easier to compute than 
in other methods. 


This format, as a consequence of its predefined composition, is able to 
produce relatively stable results and is reasonably high in its reliability. The validity, 
of course would be limited as the comprehensive meaning of the constructs and 
variables under study might not be holistic when it comes to structured and limited 
responses. In such cases, variables are made a part of the study and some open- 
ended questions as well as administration/additional instructions/probing by the 
field investigator could help in getting better results. 


Formalized and concealed questionnaire: The research studies which are trying 
to unravel the latent causes of behaviour cannot rely on direct questions. Thus, the 
respondent has to be given a set of questions that can give an indication of what 
are his basic values, opinions and beliefs, as these would influence how he would 
react to certain products or issues. For example, a publication house that wants to 
launch a newspaper wants to ascertain what are the general perceptions and current 
attitudes about newspapers. Asking a direct question would only reveal apparent 
information, thus, some disguised attitudinal questions would need to be asked in 
order to infer this. 


SA — Strongly Agree; A— Agree; N — Neutral; D — Disagree; SD — Strongly Disagree 


SA A N D SD 


1 | The individual today is better informed about everything than 
before. 


2 |I believe that one must live for the day and worry about 
tomorrow later. 


3 | An individual must at all times keep abreast of what is happening 
in the world around him/her. 


4 | Books are the best friends anyone can have. 
5 | I generally read and then decide what to buy. 


6 | My lifestyle is so hectic that I do not have time for reading the 
newspaper. 


7 | The advent of radio, television and Internet have made the 
traditional information sources-like newspapers, redundant. 


8 | A man/woman is known by what he/she reads. 


The logic behind these tests of attitude is that the questions do not seem to 
be ina particular direction and are apparently non-threatening, thus the respondent 
gives an answer which would be in the general direction of his/her attitudes. 


The advantage of these questions is that since these are structured, one can 
ascertain their impact and quantify the same through statistical techniques. Secondly, 
it has been found that psychographic questions like these increase the subject 
coverage and improve the validity of the instrument as well. Most studies interested 
in quantifying the primary response data make use of questions that are designed 
both as formalized unconcealed and formalized concealed. 


Non-formalized unconcealed: Some researchers argue that the respondent is 
not really cognizant of his/her attitude towards certain things. Also, this method 
asks him to give structured responses to attitudinal statements that essentially express 
attitudes in a manner that the researcher or experts think is the correct way. This 
however might not be the way the person thinks. Thus, rather than giving them 
pre-designed response categories, it is better to give them unstructured questions 
where he has the freedom of expressing himself the way he wants to. Some 
examples of these kinds of questions are given below: 


1. What has been the reason for the success of the ‘lean management drive’ 
that the organization has undertaken? Please specify FIVE most significant 
reasons according to YOU. 


(a) 
(b) 
(c) 
(d) 
(e) 
2. Why do you think Maggi noodles are liked by young children? 


3. How do you generally decide on where you are going to invest your money? 


4. Give THREE reasons why you believe that the Commonwealth 2010 Games 
have helped the country? 


The advantage ofthe method is that the respondent can respond in any way 
he/she believes is important. For example, for the last question, some people might 
respond by stating that it has boosted tourism in the country and contributed to the 
country‘s economy. Some might think it will encourage more international events 
to be held in the country. Some might also state that it is not a good idea and the 
government should instead be spending on improving the cause of the people who 
are below the poverty line. 
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Primary and Thus, one gets a comprehensive perspective on what the construct/product/ 


Secondary Dat ; . . : 
a ae policy means to the population at large; and at the micro level, what it means to 
people in different segments. The validity of these measures is higher than the 
NGS previous two. However, quantification is a little tedious and one cannot go beyond 


frequency and percentages to represent the findings. The other problem is the 
researcher’s bias which might lead to clubbing responses into categories which 
might not be homogenous in nature. 


Non-formalized, concealed: If the objective of the research study is to 
uncover socially unacceptable desires and latent or subconscious and 
unconscious motivations, the investigator makes use of questions of low 
structure and disguised purpose. The presumption behind this is that if the 
argument, the situation or question is ambiguous, it is most likely that the 
revelation it would result in would be more rich and meaningful. In Chapter 6, 
there was a discussion on projective techniques; these kinds of questionnaires 
are designed on the above-stated lines. The major weakness of these types of 
questionnaires is that being of a low structure, the interpretation required is 
highly skilled. Cost, time and effort are additional elements which might curtail 
the use of these techniques. A study conducted to measure to which segment 
should men’s personal care toiletries (especially moisturizers and fairness 
creams) be targeted, the investigator designed two typical bachelors’ shopping 
lists. One with a number of monthly grocery products as well as the normal 
male toiletries like shaving blades, gels, shampoos, etc., and the other list had 
the same grocery products and male toiletries but it had two additional items— 
Fair and Handsome fairness cream and sensitive skin moisturizer. The list was 
given to 20 young men to conceptualize/describe the person whose list this is. 
The answers obtained were as follows: 


List with Cream and Moisturizer List without Cream and Moisturizer 


65 per cent said this person was good looking | 10 per cent said this man was good looking 
5 per cent said typical male 39 per cent said 30 plus in age 

25 per cent said a 20-year-old 90 per cent said rugged and manly 

48 per cent said has a girlfriend 38 per cent said has a girlfriend 

46 per cent said has a boyfriend No one spoke of boyfriend 

26 per cent said spendthrift 21 per cent said thrifty 

15 per cent said ‘girly’ 32 per cent said normal Indian male 


Thus, as we can see, the normal Indian adult male is still going to take time 
to include beauty or cosmetic products into his normal personal care basket. Thus, 
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it is wiser for the marketeers to target the younger metrosexual male who is a 
heavy spender. 


Another useful way of categorizing questionnaires is on the method of 
administration. Thus, the questionnaire that has been prepared would necessitate 
a face-to-face interaction. In this case, the interviewer reads out each question 
and makes a note of the respondent’s answers. This administration is called a 
schedule. It might have a mix of the questionnaire type as described in the section 
above and might have some structured and some unstructured questions. The 
investigator might also have a set of additional material like product prototypes or 
copy of advertisements. The investigator might also have a predetermined set of 
standardized questions or clarifications , which he can use to ask questions like 
‘why do you say that?’ or ‘can you explain this in detail’ ‘what I mean to ask 
lipk > The other kind is the self-administered questionnaire, where the 
respondent reads all the instructions and questions on his own and records his 
own statements or responses. Thus, all the questions and instructions need to be 
explicit and self-explanatory. 


The selection of one over the other depends on certain study prerequisites. 


Population characteristics: In case the population is illiterate or unable to write 
the responses, then one must as a rule use the schedule, as the questionnaire 
cannot be effectively answered by the subject himself. 


Population spread: In case the sample to be studied is large and dispersed, then 
one needs to use the questionnaire. Also when the resources available for the 
study, time, cost and manpower are limited, then schedules become expensive to 
use and it is advisable to use self-administered questionnaire. 


Study area: In case one is studying a sensitive topic , like organizational climate 
or quality of working life, where the presence of an investigator might skew the 
answers in a more positive direction, then it is better that one uses the questionnaire. 
However, in case the motives and feelings are not well-developed and structured, 
one might need to do additional probing and in that case a schedule is better. If the 
objective is to explore concepts or trace the reaction of the sample population to 
new ideas and concepts, a schedule is advisable. 


Check Your Progress 


5. What is the consequence of standardized or structured observation? 
6. Define a facing-moderator group. 
7. In which structure of interview is the probability of subjectivity very high? 
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4.7 


ANSWERS TO CHECK YOUR PROGRESS 
QUESTIONS 


1. The secondary type of data has a significant time and cost advantage. 


. The secondary information collected can be used to design the primary 


data collection instruments, in order to phrase and design the right questions. 


3. Cash registers are examples of sales data type of secondary data sources. 


4.8 


. Syndicated service agencies are organizations that collect organization/ 


product-category-specific data from a regular consumer base and create a 
common pool of data that can be used by multiplier buyers, for their individual 


purpose. 


. The consequence of standardized or structured observation is that the 


observer’s bias is reduced, and the authenticity and reliability of the 
information collected is higher. 


. A facing-moderator group is a type of focus group in which the two 


moderators take opposite sides on the topic being discussed and thus, in 
the short time available, ensure that all possible perspectives are thoroughly 
explored. 


. Inthe unstructured interview the probability of subjectivity very high. 


SUMMARY 


The researcher has access to two major sources of this data: original as in 
primary sources or secondary data. 


The secondary information is useful, fast and cost-effective way of testing 
and achieving the study objectives. 


Secondary data could be collected and compiled within the organization/ 
industry. 

Data collected from an outside source is termed as external data source. 
The observational method is the simplest method of primary data collection. 


This can be differentiated into structure-unstructured; human — mechanically 
observed data. 


The focus group discussion is a cost effective method and can ideally be 
done on a small group of respondents to obtain meaningful data. 


Interview method involves a dialogue between the interviewee and the 
interviewer. This can range from unstructured to completely structured. 


Today the interviewer can make use of the telephone as well as computer 
to assist him in conducting the interview. 


4.9 KEY WORDS 


e External source of data: Information that is collected and compiled by an 
outside source that is external to the organization. 


e Focus group discussion: A form of structured group discussion involving 
people with knowledge and interest in a particular topic and a facilitator. 


e Online database: a database that can be accessed by computers. 


e Primary data: Original, problem or project-specific and collected for the 
specific objectives and needs spelt out by the researcher. 


e Secondary data: Information which is not topical or research-specific and 
has been collected and compiled by some other researcher or investigative 
body. 


e Syndicated data: Information gathered bya service or company for public 
release and sold by subscription. 


4.10 SELF ASSESSMENT QUESTIONS AND 
EXERCISES 


Short-Answer Questions 


1. Distinguish between secondary and primary methods of data collection. 
2. Explain the interview method of data collection. 
3. What are the advantages and disadvantages of secondary data? 
Long-Answer Questions 
1. How can secondary data be classified? Elaborate on each type with suitable 
examples. 


2. What is the observation method? What are the different types of observation 
methods available to the researcher? 


3. What are focus group discussions? What are the types of focus groups 
available to the researcher? 


4. What are the uses of the technique? What are the different types of 
interviews? 
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5.0 INTRODUCTION 


In the previous unit, we studied the various types, sources and methods of collecting 
data. In this unit, we will focus on different types of measurements and the statistical 
techniques that are applicable for the same. The various formats of a rating scale 
and the construction of the attitude measurement scale, along with the description 
of the distinct criteria involved in analysing a good measurement scale, are elaborated 
in this unit. 

The term ‘measurement’ means assigning numbers or some other symbols 
to the characteristics of certain objects. When numbers are used, the researcher 
must have arule for assigning a number to an observation in a way that provides 
an accurate description. We do not measure the object but some characteristics 
of it. Therefore, in research, people/consumers are not measured; what is measured 
only are their perceptions, attitude or any other relevant characteristics. There are 
two reasons for which numbers are usually assigned. First of all, numbers permit 
statistical analysis of the resulting data and secondly, they facilitate the 
communication of measurement results. 
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Attitude Measurement Scaling is an extension of measurement. Scaling involves creating a continuum 


ER on which measurements on objects are located. Suppose you want to measure 
the satisfaction level towards Kingfisher Airlines and a scale of 1 to 11 is used for 
the said purpose. This scale indicates the degree of dissatisfaction, with 1 = 

NOTES extremely dissatisfied and 11 = extremely satisfied. 


5.1 OBJECTIVES 


After going through this unit, you will be able to: 
e Define measurement 
e Distinguish between the four types of measurement scales 
e Define attitude and its three components 
e Discuss the various classifications of scales 


e Define measurement error and explain the criteria for good measurement 


5.2 TYPES OF MEASUREMENT SCALES 


There are four types of measurement scales—nominal, ordinal, interval and ratio. 
We will discuss each one of them in detail. The choice of the measurement scale 
has implications for the statistical technique to be used for data analysis. 
Nominal scale: This is the lowest level of measurement. Here, numbers are 
assigned for the purpose of identification of the objects. Any object which is 
assigned a higher number is in no way superior to the one which is assigned a 
lower number. Each number is assigned to only one object and each object has 
only one number assigned to it. It may be noted that the objects are divided into 
mutually exclusive and collectively exhaustive categories. 
Example: 
e What is your religion? 

(a) Hinduism 

(b) Sikhism 

(c) Christianity 

(d) Islam 

(e) Any other, (please specify) 

A Hindu may be assigned a number 1, a Sikh may be assigned a number 2, 

a Christian may be assigned a number 3 and so on. Any religion which is assigned 


a higher number is in no way superior to the one which is assigned a lower number. 
The assignment of numbers is only for the purpose of identification. 
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Nominal scale measurements are used for identifying food habits (vegetarian 
ornon-vegetarian), gender (male/female), caste, respondents, marital status, brands, 
attributes, stores, the players of a hockey team and so on. 


The assigned numbers cannot be added, subtracted, multiplied or divided. 
The only arithmetic operations that can be carried out are the count of each category. 
Therefore, a frequency distribution table can be prepared for the nominal scale 
variables and mode of the distribution can be worked out. One can also use chi- 
square test and compute contingency coefficient using nominal scale variables. 


Ordinal scale: This is the next higher level of measurement than the nominal scale 
measurement. One of the limitations of the nominal scale measurements is that we 
cannot say whether the assigned number to an object is higher or lower than the 
one assigned to another option. The ordinal scale measurement takes care of this 
limitation. An ordinal scale measurement tells whether an object has more or less 
of characteristics than some other objects. However, it cannot answer how much 
more or how much less. 


Example: 


e Rank the following attributes while choosing a restaurant for dinner. The 
most important attribute may be ranked one, the next important may be 
assigned a rank of 2 and so on. 


In the ordinal scale, the assigned ranks cannot be added, multiplied, 
subtracted or divided. One can compute median, percentiles and quartiles 
of the distribution. The other major statistical analysis which can be carried 
out is the rank order correlation coefficient, sign test. All the statistical 
techniques which are applicable in the case of nominal scale measurement 
can also be used for the ordinal scale measurement. However, the reverse 
is not true. This is because ordinal scale data can be converted into nominal 
scale data but not the other way round. 


Interval scale: The interval scale measurement is the next higher level of 
measurement. It takes care of the limitation of the ordinal scale measurement where 
the difference between the score on the ordinal scale does not have any meaningful 
interpretation. In the interval scale the difference of the score on the scale has 
meaningful interpretation. It is assumed that the respondent is able to answer the 
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questions on a continuum scale. The mathematical form of the data on the interval 
scale may be written as 


Y=a+bX Wherea#0 
In the interval scale, the difference in score has a meaningful interpretation 


while the ratio of the score on this scale does not have a meaningful interpretation. 
This can be seen from the following interval scale question: 


e How likely are you to buy a new designer carpet in the next six months? 


[ery unikely | Unikely | Neutral | Likely | Very kel 


saes o0 o | 2 [| 3 | 4 
—1 0 1 2 


Scale C —2 


Suppose a respondent ticks the response category ‘likely’ and another 
respondent ticks the category ‘unlikely’. If we use any of the scales A, B or C, we 
note that the difference between the scores in each case is 2. Whereas, when the 
ratio of the scores is taken, it is 2, 3 and—1 for the scales A, B and C respectively. 
Therefore, the ratio of the scores on the scale does not have a meaningful 
interpretation. The following are some examples of interval scale data. 


e How important is price to you while buying a car? 


Least Unimportant Neutral Important Most 
important important 
1 2 3 4 5 
e How do you rate the work environment of your organization? 
Very good Good Neither good Bad Very bad 
nor bad 
5 4 3 2 1 


e How expensive is the restaurant ‘Punjabi By Nature’? 
Extremely Definitely Somewhat Somewhat Definitely Extremely 
expensive expensive expensive inexpensive inexpensive inexpensive 
1 2 3 4 5 6 


The numbers on this scale can be added, subtracted, multiplied or divided. 
One can compute arithmetic mean, standard deviation, correlation coefficient and 
conduct a t-test, Z-test, regression analysis and factor analysis. As the interval 
scale data can be converted into the ordinal and the nominal scale data, therefore 
all the techniques applicable for the ordinal and the nominal scale data can also be 
used for interval scale data. 


Ratio scale: This is the highest level ofmeasurement and takes care ofthe limitations 
of the interval scale measurement, where the ratio of the measurements on the 
scale does not have a meaningful interpretation. The ratio scale measurement can 
be converted into interval, ordinal and nominal scale. But the other way round is 


not possible. The mathematical form of the ratio scale data is given by Y= b X. In Attitude Measurement 


f i pa . : d Scali 
this case, there is a natural zero (origin), whereas in the interval scale we had an Aa 
arbitrary zero. Examples of the ratio scale data are weight, distance travelled, 
income and sales of a company, to mention a few. 

NOTES 


All the mathematical operations can be carried out using the ratio scale 
data. In addition to the statistical analysis mentioned in the interval, the ordinal and 
the nominal scale data, one can compute coefficient of variation, geometric mean 
and harmonic mean using the ratio scale measurement. 


5.2.1 Attitude 


An attitude is viewed as an enduring disposition to respond consistently in a given 
manner to various aspects of the world, including persons, events and objects. A 
company is able to sell its products or services when its customers have a favourable 
attitude towards its products/services. In the reverse scenario, the company will 
not be able to sustain itself for long. It, therefore, becomes very important to 
measure the attitude of the customers towards the company’s products/services. 
Unfortunately, attitude cannot be measured directly. In order to measure an attitude, 
we make an inference based on the perceptions the customers have about the 
product/services. The attitude is derived from the perceptions. If the consumers 
have a favourable perception towards the products/services, the attitude will be 
favourable. Therefore, the attitudes are indirectly observed. 


Basically, attitude has three components: cognitive, affective and intention 
(or action) components. 


Cognitive component: This component represents an individual’s information 
and knowledge about an object. It includes awareness of the existence of the 
object, beliefs about the characteristics or attributes of the object and judgement 
about the relative importance of each of the attributes. In a survey, if the respondents 
are asked to name the companies manufacturing plastic products, some respondents 
may remember names like Tupperware, Modicare and Pearl Pet. This is called 
unaided recall awareness. More names are likely to be remembered when the 
investigator makes a mention of them. This is aided recall. The examples of beliefs 
or judgements could be that the products of Tupperware are of high quality, non- 
toxic and can be used in parties; a mutton dish can be cooked in a pressure 
cooker in less than 30 minutes and so on. 


Affective component: The affective component summarizes a person’s overall 
feeling or emotions towards the objects. The examples for this component could 
be: the food cooked in a pressure cooker is tasty, taste of orange juice is good or 
the taste of bitter gourd is very bad. 


Intention or action component: This component of an aptitude, also called the 
behavioural component, reflects a predisposition to an action by reflecting the 
consumer’s buying or purchase intention. It also reflects a person’s expectations 
of future behaviour towards an object. 
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There is a relationship between attitude and behaviour. If a consumer does 
not have a favourable attitude towards the product, he/she will certainly not buy 
the product. However, having a favourable attitude does not mean that it would 
be reflected in the purchase behaviour. This is because intention to buy a product 
has to be backed by the purchasing power of the consumer. Therefore, the 
relationship between the attitude and the purchase behaviour is a necessary 
condition for the purchase of the product but it is not a sufficient condition. This 
relationship could hold true at the aggregate level but not at the individual level. 


Check Your Progress 
1. State the limitation of nominal scale which is taken care of by the ordinal 
scale. 
2. What is the highest level of measurement? 


3. Define the intention component of attitude. 


5.3 CLASSIFICATION OF SCALES 


One of the ways of classifications of scales is in terms of the number of items in the 
scale. Based upon this, the following classification may be proposed: 


5.3.1 Single Item vs Multiple Item Scale 
Single item scale: In the single item scale, there is only one item to measure a 
given construct. For example: 
Consider the following question: 
e How satisfied are you with your current job? 
Very Dissatisfied 
Dissatisfied 
Neutral 
Satisfied 
Very satisfied 


The problem with the above question is that there are many aspects to a 
job, like pay, work environment, rules and regulations, security of job and 
communication with the seniors. The respondent may be satisfied on some of the 
factors but may not on others. By asking a question as stated above, it will be 
difficult to analyse the problem areas. To overcome this problem, a multiple item 
scale is proposed. 


Multiple item scale: In multiple item scale, there are many items that play a role Attitude Measurement 
in forming the underlying construct that the researcher is trying to measure. This is eee 
because each of the item forms some part of the construct (satisfaction) which the 

researcher is trying to measure. As an example, some of the following questions VOIS 


may be asked in a multiple item scale. 

e How satisfied are you with the pay you are getting on your current job? 
Very dissatisfied 
Dissatisfied 
Neutral 
Satisfied 
Very satisfied 

e How satisfied are you with the rules and regulations of your organization? 
Very dissatisfied 
Dissatisfied 
Neutral 
Satisfied 
Very satisfied 


5.3.2 Comparative vs Non-comparative Scales 


The scaling techniques used in research can also be classified into comparative 
and non-comparative scales (Figure 5.1). 


Scaling Techniques 


Comparative Scales Non-comparative Scales 


Paired Comparison 


Graphic Rating Scale 
(Continuous Rating Scale) 


Constant Sum 


Rank Order 


Q-Sort and other 
Procedures 


Fig. 5.1 Types of Scaling Techniques 
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Comparative scales 


In comparative scales it is assumed that respondents make use of a standard 
frame of reference before answering the question. For example: 


A question like ‘How do you rate Barista in comparison to Cafe Coffee 
Day on quality of beverages?’ is an example of the comparative rating scale. It 
involves the direct comparison of stimulus objects. Example: 


e Please rate Domino’s in comparison to Pizza Hut on the basis of your 
satisfaction level on an 11-point scale, based on the following parameters: 
(1 = Extremely poor, 6 = Average, 11 = Extremely good). Circle your 


response: 

a. |Variety of menu options 1/2|)3/4}]5/)/6/7|8}9| 10/11 
b. |Value for money 1/2|)3/4}]5/)/6/7|8}9| 10/11 
c. |Speed of service (delivery time) 1/2|)3/4}]5)/6/7|8}9| 10/11 
d. |Promotional offers 1/2)/3)4]/5|6/7|8/9/10/11 
e. |Food quality 1/2/3/4|5|6/7|8/9/ 10/11 


Comparative scale data is interpreted generally in a relative kind. Below 
are discussed each of the scale under comparative rating scales in detail below: 


Paired comparison scales: Here a respondent is presented with two objects 
and is asked to select one according to whatever criterion he or she wants to use. 
The resulting data from this scale is ordinal in nature. As an example, suppose a 
parent wants to offer one of the four items to a child—chocolate, burger, ice 
cream and pizza. The child is offered to choose one out of the two from the six 
possible pairs, i.e., chocolate or burger, chocolate or ice cream, chocolate or 
pizza, burger or ice cream, burger or pizza and ice cream or pizza. In general, if 
there are n items, the number of paired comparison would be 
(n(n — 1)/2). Paired comparison technique is useful when the number of items is 
limited because it requires a direct comparison and overt choice. 


Rank order scaling: In the rank order scaling, respondents are presented with 


several objects simultaneously and asked to order or rank them according to 
some criterion. Consider, for example the following question: 


e Rank the following soft drinks in order of your preference, the most preferred 
soft drink should be ranked one, the second most preferred should be ranked 
two and so on. 


Soft Drinks Rank 
Coke 
Pepsi 


Limca 


Sprite 
Mirinda 


Seven Up 


Fanta 


Like paired comparison, this approach is also comparative in nature. The Attitude Measurement 


problem with this scale is that if a respondent does not like any of the above- ene 
mentioned soft drink and is forced to rank them in the order of his choice, then, the 
soft drink which is ranked one should be treated as the least disliked soft drink tors 


and similarly, the other rankings can be interpreted. The rank order scaling results 
in the ordinal data. 


Constant sum rating scaling: In constant sum rating scale, the respondents are 
asked to allocate a total of 100 points between various objects and brands. The 
respondent distributes the points to the various objects in the order of his preference. 
Consider the following example: 


e Allocate a total of 100 points among the various schools into which you 
would like to admit your child. The points should be allocated in such a way 
that the sum total of the points allocated to various schools adds up to 100. 


Schools Points 


DPS 


Mother’s International fF 


APEEJAY 
DAV Public School 


Laxman Public School 
TOTAL POINTS 100 


Suppose Mother’s International is awarded 30 points, whereas Laxman 
Public School is awarded 15 points, one can make a statement that the respondent 
rates Mother’s International twice as high as Laxman Public School. This type of 
data is not only comparative in nature but could also result in ratio scale measurement. 


Q-sort technique: This technique makes use of the rank order procedure in 
which objects are sorted into different piles based on their similarity with respect 
to certain criterion. Suppose there are 100 statements and an individual is asked 
to pile them into five groups, in such a way, that the strongly agreed statements 
could be put in one pile, agreed statements could be put in another pile, neutral 
statement form the third pile, disagreed statements come in the fourth pile and 
strongly disagreed statements form the fifth pile, and so on. The data generated in 
this way would be ordinal in nature. The distribution of the number of statement in 
each pile should be such that the resulting data may follow a normal distribution. 


Non-comparative scales 


In the non-comparative scales, the respondents do not make use of any frame of 
reference before answering the questions. The resulting data is generally assumed 
to be interval or ratio scale. 


The non-comparative scales are divided into two categories, namely, the 
graphic rating scales and the itemized rating scales. A useful and widely used itemized 
rating scale is the Likert scale. 
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Graphic rating scale 


This is a continuous scale, also called graphic rating Scale. In the graphic rating 
scale the respondent is asked to tick his preference on a graph. Consider for 
example the following question: 


e Please puta tick mark (v^ on the following line to indicate your preference 


for fast food. 
4 7 
Least Most 
Preferred Preferred 


To measure the preference of an individual towards the fast food one has to 
measure the distance from the extreme left to the position where a tick mark has 
been put. Higher the distance, higher would be the individual preference for fast 
food. This scale suffers from two limitations—one, if a respondent has put a tick 
mark at a particular position and after ten minutes, he or she is given another form 
to put a tick mark, it will virtually be impossible to put a tick at the same position 
as was done earlier. Does it mean that the respondent’s preference for fast food 
has undergone a change in 10 minutes? The basic assumption in this scale is that 
the respondents can distinguish the fine shade in differences between the preference/ 
attitude which need not be the case. Further, the coding, editing and tabulation of 
data generated through such a procedure is a very tedious task and researchers 
try to avoid using it. 


Itemized rating scale 


In the itemized rating scale, the respondents are provided with a scale that has a 
number of brief descriptions associated with each of the response categories. The 
response categories are ordered in terms of the scale position and the respondents 
are supposed to select the specified category that describes in the best possible 
way an object is rated. There are certain issues that should be kept in mind while 
designing the itemized rating scale. These issues are: 


Number of categories to be used: There is no hard and fast rule as to how 
many categories should be used in an itemized rating scale. However, it is a practice 
to use five or six categories. Some researches are of the opinion that more than 
five categories should be used in situations where small changes in attitudes are to 
be measured. There are others that argue that the respondents would find it difficult 
to distinguish between more than five categories. 


Odd or even number of categories: It has been a matter of debate among the 
researchers as to whether odd or even number of categories are to be used. By 
using even number of categories the scale would not have a neutral category and 
the respondent will be forced to choose either the positive or the negative side of 
the attitude. If odd numbers of categories are used, the respondent has the freedom 
to be neutral if he wants to be so. 


Balanced versus unbalanced scales: A balanced scale is the one which has 
equal number of favourable and unfavourable categories. The following is the 
example of a balanced scale: 
e How important is price to you in buying a new car? 

Very important 

Relatively important 

Neither important nor unimportant 

Relatively unimportant 

Very unimportant 

In this question, there are five response categories, two of which emphasize 
the importance of price and two others that do not show its importance. The 
middle category is neutral. 
The following is the example of the unbalanced scale. 
e How important is price to you in buying a new car? 

More important than any other factor 

Extremely important 

Important 

Somewhat important 

Unimportant 

In this question there are four response categories that are skewed towards 

the importance given to the price, whereas one category is for the unimportant 
side. Therefore, this question is an unbalanced question. 
Nature and degree of verbal description: Many researchers believe that each 
category must have a verbal, numerical or pictorial description. Verbal description 
should be clearly and precisely worded so that the respondents are able to 
differentiate between them. Further, the researcher must decide whether to label 
every scale category, some scale categories, or only extreme scale categories. 
Forced versus non-forced scales: In the forced scale, the respondent is forced 
to take a stand, whereas in the non-forced scale, the respondent can be neutral if 
he/she so desires. The argument for a forced scale is that those who are reluctant 
to reveal their attitude are encouraged to do so with the forced scale. Paired 
comparison scale, rank order scale and constant sum rating scales are examples 
of forced scales. 
Physical form: There are many options that are available for the presentation of 
the scales. It could be presented vertically or horizontally. The categories could be 
expressed in boxes, discrete lines or as units on a continuum. They may or may 


not have numbers assigned to them. The numerical values, if used, may be positive, 
negative or both. 
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Suppose we want to measure the perception about Jet Airways using a 
multi-item scale. One of the questions is about the behaviour of the crew members. 
Given below is a set of scale configurations that may be used to measure their 
behaviour. 


The behaviour of the crew members of Jet Airways is: 


1. Very bad Very good 

2. Very bad 1 2 3 4 5 Very good 
-2 -1 0 1 2 

Very bad Neither bad nor good Very good 


Below we will describe Likert scale, which is very commonly used in survey 
research. 


Likert scale: This is a multiple item agree—disagree five-point scale. The 
respondents are given a certain number of items (statements) on which they are 
asked to express their degree of agreement/disagreement. This is also called a 
summated scale because the scores on individual items can be added together to 
produce a total score for the respondent. An assumption of the Likert scale is that 
each of the items (statements) measures some aspect of a single common factor, 
otherwise the scores on the items cannot legitimately be summed up. In a typical 
research study, there are generally 25—30 items on a Likert scale. 


To construct a Likert scale to measure a particular construct, a large number 
of statements pertaining to the construct are listed. These statements could range 
from 80—120. The identification of the statements is done through exploratory 
research which is carried out by conducting a focus group, unstructured interviews 
with knowledgeable people, literature survey, analysis of case studies and so on. 
Suppose we want to assess the image of a company. As a first step, an exploratory 
research may be conducted by having an informal interview with the customers, 
and employees of the company. The general public may also be contacted. A 
survey of the literature on the subject may also give a set of information that could 
be useful for constructing the statements. Suppose the number of statements to 
measure the constructs is 100 in number. Now samples of representative 
respondents are asked to state their degree of agreement/disagreement on those 
statements. Table 5.1 gives a few statements to assess the image of the company. 


It may be noted that only anchor labels and no numerical values are assigned 
to the response categories. Once the scale is administered, numerical values are 
assigned to the response categories. The scale contains statements’ some of which 
are favourable to the construct we are trying to measure and some are unfavourable 
to it. 


For example, out of the ten statements given, statements numbering 1, 2, 4, 
6 and 9 in Table 5.1 are favourable statements, whereas the remaining are 
unfavourable statements. The reason for having a mixture of favourable and 
unfavourable statements in a Likert scale is that the responses by the respondent 


should not become monotonous while answering the questions. Generally, in a Attitude Measurement 


. ; . d Scali 
Likert scale, there is an approximately equal number of favourable and unfavourable EA AS 
statements. Once the scale is administered, numerical values are assigned to the 
responses. The rule is that a ‘strongly agree’ response for a favourable statement 

NOTES 


should get the same numerical value as the ‘strongly disagree’ response of the 
unfavourable statement. Suppose for a favourable statement the numbering is done 
as Strongly disagree = 1, Disagree = 2, Neither agree nor disagree = 3, Agree = 
4 and Strongly agree = 5. Accordingly, an unfavourable statement would get the 
numerical values as Strongly disagree = 5, Disagree = 4, Neither agree nor disagree 
= 3, Agree = 2 and Strongly agree = 1. In order to measure the image that the 
respondent has about the company, the scores are added. 


Table 5.1 Likert Scale Statements to Measure the ee of the Company 


. |The company makes quality 
products 


. {It is a leader in technology 


. It doesn’t care about the 
general public. 


. |The company leads in R&D 
to improve products 


. |The company is not a good 
paymaster. 


. |The products of the company 
go through stringent quality 
tests. 


8. |It does not care about the 
community near its plant. 


9. |The company’s stocks are v 
good to buy or own. 


10. |The company does not have v 
good labour relations. 


For example, if a respondent has ticked (Y^) statements numbering from 
one to ten as shown in Table 5.1, his total score would be3+5+4+4+5+4 
+4+5+4+4= 42 out of 50. Now if there are 100 respondents and 100 
statements, the score on the image of the company can be worked out for each 
respondent by adding his/her scores on the 100 statements. The minimum score 
for each respondent will be 100, whereas the maximum score would be 500. 


As mentioned earlier, a typical Likert scale comprises about 25-30 
statements. In order to select 25 statements from the 100 statements, we need to 
discard some of them. The rule behind discarding the statements is that those 
items that are non-discriminating should be removed. The procedure for choosing 
25 (say number of statements) is shown. 
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As mentioned earlier, the score for each of the respondents on each of the 
statements can be used to measure his/her total score about the image of the 
company. The data may look as given in Table 5.2. 


Table 5.2 Total Score and Individual Score of Each Respondent on Various Statements 


Total Score 


Table 5.2 shows that the total score for respondent no. 1 is 410, whereas 
for respondent no. 2 it is 209. This means that respondent no. 1 has a more 
favourable image for the company as compared to respondent no. 2. Now, in 
order to select 25 statements, let us consider statements numbering i andj. We 
note that the statement no. j is more discriminating as compared to statement no. 
i. This is because the score on statement j is very highly correlated with the total 
score as compared to the scores on statement i. Therefore, if we have to choose 
between i andj, we will choose statement no. j. From this we can conclude that 
only those statements will be selected which have a very high correlation with the 
total score. Therefore, the 100 correlations are to be arranged in the descending 
order of magnitudes corresponding to each statement and only top 25 statements 
having a high correlation with the total score need to be selected. 


Check Your Progress 
4. What is the Q-sort technique? 


5. Give some examples of forced scales. 


6. Why is the Likert scale also called a summated scale? 


5.4 MEASUREMENT ERROR 


Measurement error occurs when the observed measurement on a construct or 
concept deviates from its true values. The following is a list of the sources of 
measurement errors. 


e There are factors like mood, fatigue and health of the respondent which 
may influence the observed response while the instrument is being 
administered. The other factors could be education, job, awareness of topic 
and reluctance to express an opinion. 


e The variations in the environment in which measurements are taken may Attitude Measurement 


d Scali 
also result in a departure from the true value. ees 
e At times, the errors may be committed at the time of coding, entering of 
data from questionnaire to the spreadsheet on the computer and at the NOTES 


tabulation stage. The other reasons could be defective instrument for data 
collection like lengthy and ambiguous questionnaire with leading questions 
(suggestive responses) in the instrument. 


The observed measurement in any research need not be equal to the true 
measurement. The observed measurement can be written as 


O=T+S+R 
Where, O =Observed measurement 
T =True score 
S = Systematic error 
R = Random error 


It may be noted that the total error consists of two components—systematic 
error and random error. Systematic error causes a constant bias in the measurement. 
Suppose there is a weighing scale that weighs 50 gm less for every one kg of 
product being weighed. The error would consistently remain the same irrespective 
of the kind of product and the time at which product is weighed. Random error on 
the other hand involves influences that bias the measurements but are not systematic. 
Suppose we use different weighing scales to weigh one kg of a product and if 
systematic error is assumed to be absent, we may find that recorded weights may 
fall within a range around the true value of the weight, thereby causing random 
error. 


5.4.1 Criteria for Good Measurement 


There are three criteria for evaluating measurements: reliability, validity and sensitivity. 
It may be noted that there is a relationship between reliability and sensitivity. If we 
want to make an item more sensitive, it may be achieved at the cost of reliability. 
This means to get more sensitivity, the researcher might have to compromise with 
reliability. 

1. Reliability 

Reliability is concerned with consistency, accuracy and predictability of the scale. 
It refers to the extent to which a measurement process is free from random errors. 
The reliability ofa scale can be measured using the following methods: 


Test—retest reliability: In this method, repeated measurements of the same person 
or group using the same scale under similar conditions are taken. A very high 
correlation between the two scores indicates that the scale is reliable. The researcher 
has to be careful in deciding the time difference between two observations. If the 
time difference between two observations is very small it is very likely that the 
respondent would give same answer which could result in higher correlation. Further 
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if the difference is too large, the attitude might have undergone a change during 
that period, resulting in a weak correlation and hence poor reliability. Therefore 
researcher have to be very careful in deciding the time difference between 
observation. Generally, a time difference of about 5-6 months is considered as an 
ideal period. 


Split-half reliability method: This method is used in the case of multiple item 
scales. Here the number of items is randomly divided into two parts and a 
correlation coefficient between the two is obtained. A high correlation indicates 
that the internal consistency of the construct leads to greater reliability. 


2. Validity 


The validity ofa scale refers to the question whether we are measuring what we 
want to measure. Validity of the scale refers to the extent to which the measurement 
process is free from both systematic and random errors. The validity ofa scale is 
amore serious issue than reliability. There are different ways to measure validity. 


Content validity: This is also called face validity. It involves subjective judgement 
by an expert for assessing the appropriateness of the construct. For example, to 
measure the perception of a customer towards Kingfisher Airlines, a multiple item 
scale is developed. A set of 15 items is proposed. These items when combined in 
an index measure the perception of Kingfisher Airlines. In order to judge the content 
validity of these 15 items, a set of experts may be requested to examine the 
representativeness of the 15 items. The items covered may be lacking in the content 
validity if we have omitted behaviour of the crew, food quality, and food quantity, 
etc., from the list. In fact, conducting the exploratory research to exhaust the list of 
items measuring perception of the airline would be of immense help in such a case. 


Predictive validity: This involves the ability of a measured phenomena at one 
point of time to predict another phenomenon at a future point of time. If the 
correlation coefficient between the two is high, the initial measure is said to have a 
high predictive ability. As an example, consider the use of the common admission 
test (CAT) to shortlist candidates for admission to the MBA programme in a 
business school. The CAT scores are supposed to predict the candidate’s aptitude 
for studies towards business education. 


3. Sensitivity 


Sensitivity refers to an instrument’s ability to accurately measure the variability ina 
concept. A dichotomous response category such as agree or disagree does not 
allow the recording of any attitude changes. A more sensitive measure with numerous 
categories on the scale may be required. For example, adding ‘strongly agree’, 
‘agree’, ‘neither agree nor disagree’, ‘disagree and ‘strongly disagree’ categories 
will increase the sensitivity of the scale. 

The sensitivity of scale based on a single question or a single item can be 
increased by adding questions or items. In other words, because composite 
measures allow for a greater range of possible scores, they are more sensitive 
than a single-item scale. 
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ANSWERS TO CHECK YOUR PROGRESS 
QUESTIONS 


. One of the limitation of the nominal scale measurements is that we cannot 


say whether the assigned number to an object is higher or lower than the 
one assigned to another option. The ordinal scale measurement takes care 
of this limitation. 


2. The highest level of measurement is the ratio scale. 


3. The intention component of attitude also called the behavioural component, 


5.6 


reflects a predisposition to an action by reflecting the consumer’s buying or 
purchase intention. 


. The Q-sort technique makes use of the rank order procedure in which 


objects are sorted into different piles based on their similarity with respect 
to certain criterion. 


. Some examples of forced scales include paired comparison scale, rank 


order scale and constant sum rating scales. 


. The Likert scale is also called as summated scale because the scores on 


individual items can be added together to produce a total score for the 
respondent. 


. Systematic errors are one of the components of total error which causes a 


constant bias in the measurement. 


. Split-half reliability method is the method in which the number of items is 


randomly divided into two parts and a correlation between the two is 
obtained. 


SUMMARY 


Measurement means the assignment of numbers or other symbols to the 
characteristics of certain objects. Scaling is an extension of measurement. 
Scaling involves creating a continuum on which measurements on the objects 
are located. There are four types of measurement scales: nominal, ordinal, 
interval and ratio scale. 
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7. What are systematic errors? 
8. Name the method in which the number of items is randomly divided into NOTES 
two parts and a correlation coefficient between the two is obtained. 
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Attitude Measurement e Attitude is a predisposition of the individual to evaluate some objects or 

and Scaling : Pe : 
symbol. Attitude has three components: cognitive, affective and 
intention or action component. 


Scales can be classified as single-item and multiple-item scales. Another 
classification could be whether the scales are comparative or non- 
comparative in nature. 


NOTES 


The observed measurement need not be equal to the true value of 
the measurement. Some systematic and random errors may be found in the 
observed measurement. There are three criteria for determining the accuracy 
of ameasurement—teliability, validity and sensitivity. 


5.7 KEY WORDS 


Balanced scale: A scale that has equal number of favourable and 
unfavourable categories. 


Comparative scale: A scale in which respondents make use of some 
standard frame of reference before answering the question. 


Forced scale: A scale in which the respondent is forced to take a stand 


Interval scale: A scale that makes use of an arbitrary origin. 


Validity: It deals with whether a scale measures what it is supposed to 
measure. 


5.8 SELF ASSESSMENT QUESTIONS AND 
EXERCISES 


Short-Answer Questions 
1. What is the meaning of measurement in research? Give examples. 
2. Outline the steps involved in constructing a Likert scale. 


3. Briefly explain the concepts of reliability, validity and sensitivity. 
Long-Answer Questions 
1. Discuss four types of measurements using examples. 


2. Define attitude. What are its various components? 


3. Explain an itemized rating scale. What are the various issues involved in 
constructing an itemized rating scale? 


Self-Instructional 
94 Material 


5.9 FURTHER READINGS 


Chawla D and Sondhi N. 2016. Research Methodology: Concepts and Cases, 
2nd edition. New Delhi: Vikas Publishing House. 


Easterby-Smith, M, Thorpe, R and Lowe, A. 2002. Management Research: An 
Introduction, 2™ edn. London: Sage. 


Grinnell, Richard Jr (ed.). 1993. Social Work, Research and Evaluation 4" 
edn. Itasca, Illinois: F E Peacock Publishers. 


Kerlinger, Fred N. 1986. Foundations of Behavioural Research, 3" edn. New 
York: Holt, Rinehart and Winston. 


Attitude Measurement 
and Scaling 


NOTES 


Self-Instructional 
Material 95 


Questionnaire Design 


UNIT 6 QUESTIONNAIRE DESIGN 


NOTES Structure 
6.0 Introduction 
6.1 Objectives 
6.2 The Questionnaire Method 
6.2.1 Types of Questionnaire 
6.3 Process of Questionnaire Designing 
6.4 Advantages and Disadvantages of the Questionnaire Method 
6.5 Answers to Check Your Progress Questions 
6.6 Summary 
6.7 Key Words 
6.8 Self Assessment Questions and Exercises 
6.9 Further Readings 


6.0 INTRODUCTION 


In Unit 4, we discussed some of the methods of primary data like observation, 
focus group discussion and interviews. However, a discussion on data collection 
would be incomplete if one did not talk about the questionnaire method. This is 
the most cost effective and widely used method, apart from being extremely user 
friendly. The questionnaire method is flexible enough to reveal data that is in the 
respondents own words and language. It can be made extremely scientific by 
framing questions which enable a very advanced level of quantitative measurement 
and analysis. The pattern of questioning is always designed, keeping in mind the 
respondent’s comfort and ease of answering. Today, with the wide use of technology 
it is very easy to use the questionnaire method even without being present physically 
in front of the respondent. 

Even though all of us have filled a questionnaire at some time or the other 
and know what it must have, designing a well structured and study specific 
questionnaire requires a structured and logical path so that the effort of collecting 
information using the questionnaire is meaningful. In this unit you will learn about 
the various aspects of the questionnaire method in detail. The entire process of 
questionnaire designing will be discussed at length, with special reference to the 
different kinds of questionnaires available to the researcher. 


6.1 OBJECTIVES 


After going through this unit, you will be able to: 
e Recognize the relevance on the questionnaire method in research 
e Describe the step-wise process involved in the design of a questionnaire 
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e Define the content of the questions 
e Define the flow and sequence in the questioning method 


e Recognize the advantages and disadvantages of using the questionnaire 


6.2 THE QUESTIONNAIRE METHOD 


The questionnaire is a research technique that consists of a series of questions 
asked to respondents, in order to obtain statistically useful information about a 
given topic. Itis one of the most cost-effective methods of collecting primary data, 
which can be used with considerable ease by most individual and business 
researchers. It has the advantage of flexibility of approach and can be successfully 
adapted for most research studies. The instrument has been defined differently by 
various researchers. Some take the traditional view ofa written document requiring 
the subject to record his/her own responses (Kervin, 1999). Others have taken a 
broader perspective to include structured interview also as a questionnaire (Bell, 
1999). It is essentially a data-collection instrument that has a predesigned set of 
questions, following a particular structure (De Vaus, 2002). Since it includes a 
standard set of questions, it can be successfully used to collect information from a 
large sample in a reasonably short time period. 


However, the use of questionnaire is not always the best method in all research 
studies. For example, at the exploratory stage, rather than questionnaire, it is 
advisable to use a more unstructured interview. Secondly, when the number of 
respondents is small and one has to collect more subjective data, then a questionnaire 
is not advisable. 


Criteria for designning a questionnaire 


There are certain criteria that must be kept in mind while designing the questionnaire. 
The first and foremost requirement is that the spelt-out research objectives must 
be converted into clear questions which will extract answers from the respondent. 
This is not as easy as it sounds, for example, if one wants to know how many times 
your teacher praised you in the week? It is very difficult to give an exact number. 
The second requirement is, it should be designed to engage the respondent and 
encourage a meaningful response. For example, a questionnaire measuring stress 
cannot have a voluminous set of questions which fatigue the subject. The questions, 
thus, should encourage response and be easy to understand. Lastly, the questions 
should be self-explanatory and not confusing as then the person will answer the 
way he understood the question and not in terms of what was asked. This will be 
discussed in detail later, when we discuss the wording of the questions. 


6.2.1 Types of Questionnaire 


There are many different types of questionnaire available to the researcher. The 
categorization can be done on the basis ofa variety of parameters. The two criteria 
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that are most frequently used for designing purposes are the degree of structure 
and the degree of concealment. Structure refers to the degree to which the response 
category has been defined. Concealment refers to the degree to which the purpose 
of the study is explained to the respondent. 


Instead of considering them as individual types, most research studies use a 
mixed format. Thus, they will be discussed here as a two-by-two matrix 
(Table 6.1). 


Table 6.1 Types of Questionnaire 


FORMALIZED NON FORMALIZED 
Most research studies use The response categories 
UNCONCEALED Standardised Questionnaires like these [mmm have more flexibility 
Used for assessing psychographic — Questionnaires using projective 
CONCEALED and subjective constructs techniques or sociometric analysis 


Let us discuss the types of questionnaires. Qustionnaires can be categorized 
on the basis of their structure or method of administration. 


Based on the structure, questionnaires can be divided into the following 
categories: 


Formalized and unconcealed questionnaire: This is the one that is the most 
frequently used by all management researchers. For example, ifa new brokerage 
firm wants to understand the investment behaviour of people, they would structure 
the questions and answers as follows: 


1. Do you carry out any investment(s)? 
Yes No 


If yes, continue, else terminate. 


2. Out of the following options, where do you invest? (tick all that apply). 


Precious metals , real estate , stocks , 
government instruments , mutual funds , 
any other 


This kind of structured questionnaire is easy to administer, and has both the 
questions as self-explanatory and the answer categories clearly defined. 


Formalized and concealed questionnaire: These questionnaires have a formal 
method of questioning; however the purpose is not clear to the respondent. The 
research studies which are trying to find out the latent causes of behaviour and 
cannot rely on direct questions use these. For example young people cannot be 
asked direct questions on whether they are likely to indulge in corruption at work. 
Thus, the respondent has to be given a set of questions that can give an indication 
of what are his basic values, opinions and beliefs, as these would influence how he 
would react to issues. 


Non-formalized and unconcealed: Some researchers argue that rather than giving 
the respondents pre-designed response categories, it is better to give them 
unstructured questions where they have the freedom of expressing themselves the 
way they want to. Some examples of these kinds of questions are given below: 


1. Why do you think Maggi noodles are liked by young children? 


2. How do you generally decide on where you are going to invest your money? 


3. Give THREE reasons why you believe that the show Satyamev Jayate 
has affected the common Indian person? 


The data obtained here is rich in content, but quantification cannot go beyond 
frequency and percentages to represent the findings. 


Non-formalized and concealed: If the objective of the research study is to uncover 
socially unacceptable desires and subconscious and unconscious motivations, the 
investigator makes use of questions of low structure and disguised purpose. 
However, these require interpretation that is highly skilled. Cost, time and effort 
are also much higher than others. 


Another useful way of categorizing questionnaires is on the method of 
administration. Thus, the questionnaire that has been prepared would necessitate 
a face-to-face interaction. In this case, the interviewer reads out each question 
and makes a note of the respondent’s answers. This administration is called a 
schedule. It might have a mix of the questionnaire type as described in the section 
above and might have some structured and some unstructured questions. The 
other kind is the self-administered questionnaire, where the respondent reads 
all the instructions and questions on his own and records his own statements or 
responses. Thus, all the questions and instructions need to be explicit and self- 
explanatory. 


The selection of one over the other depends on certain study prerequisites. 


Population characteristics: In case the population is illiterate or unable to write 
the responses, then one must as a rule use the schedule, as the questionnaire 
cannot be effectively answered by the subject himself. 


Population spread: Incase the sample to be studied is large and widely spread, 
then one needs to use the questionnaire. When the resources available for the 
study are limited, then schedules become expensive to use and the self-administered 
questionnaire is better. 


Study area: In case one is studying a sensitive topic like harassment at work- a 
self administered questionnaire is suggested. However, in case the study topic 
needs additional probing then in that case a schedule is better. 


There is another categorization that is based upon the mode of administration; 
this would be discussed in later sections of the unit. 
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Check Your Progress 


1. State the first and foremost requirement of a questionnaire. 


NOTES 2. Which type of questionnaire is used if the objective of the research is to 
uncover socially unacceptable desires and subconscious and unconscious 
motivations? 


3. Name the categories of questionnaires on the basis of method of 
administration. 


6.3 PROCESS OF QUESTIONNAIRE DESIGNING 


Even though the questionnaire method is most used by researchers, designing a 
well-structured instrument needs considerable skill. Presented below is a 
standardized process that a researcher can follow. 


Figure 6.1 summarizes the steps involved in questionnaire design. 


Convert the Research Objectives into the Information Needed 


Method of Administering the Questionnaire 


Content of the Questions 


Motivating the Respondent to Answer 


Determining Type of Questions 


Question Design Criteria 


Determine the Questionnaire Structure 


Physical Presentation of the Questionnaire 


| Pilot Testing the Questionnaire | 


Administering the Questionnaire 
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1. Convert the research objectives into information areas 


This is the first step of the design process. By this time the researcher is clear 
about the research questions; research objectives; variables to be studied; research 
information required and the characteristics of the population being studied. Once 
these tasks are done, one can prepare a tabled framework so that the questions 
which need to be developed become clear. This step-wise process is explained 
with an example in Table 6.2. 


Table 6.2 Framework for Identifying Information Needs 


Information Population 
Research Research Variables to be (Primary to be 
Questions Objectives Studied Required) Studied 


What is the To identify the Usage behaviour | Uses of Consumers 
nature of plastic different uses of | Demographic plastic bags Retailers 
bag usage plastic bags. details Disposal of 

amongst people | To find out the plastic bags 

in the NCR method of 

(National Capital | disposal of 


Region)? plastic bags. 
To find out who 
uses plastic 
bags. 
To find out what 
is the level of 
consciousness 
that people have 
about the 
environment. 


2. Method of administration 


Once the researcher has identified his information area; he needs to specify how 
the information should be collected. The researcher usually has available to him a 
variety of methods for administering the study. The main methods are personal 
schedule (discussed earlier in the unit), self-administered questionnaire through 
mail, fax, e-mail and web-based questionnaire. There are different preconditions 
for using one method over the other (Table 6.3). 


Table 6.3 Mode sees and Design Implications 


ee a mall] or oa 
a Sereda medium 
control 
O |g eau | tow] 
a eae a ea E 


Coste taken 
Samora conre | noh f mon f Mesum f ow f ow 


Response rate high high medium 
Interviewer bias high high 
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3. Content of the questionnaire 


The next step is to determine the matter to be included as questions in the measure. 
The researcher needs to do an objective quality check in order to see what research 
objective/information need the question would be covering before using any of the 
framed questions. 


How essential is it to ask the question? You must remember that the time of the 
respondent is precious and it should not be wasted. Unless a question is adding to 
the data needed for getting an answer to the research problem, it should not be 
included. For example, if one is studying the usage of plastic bags, then demographic 
questions on age group, occupation, education and gender might make sense but 
questions related to marital status, family size and the state to which the respondent 
belongs are not required as they have no direct relation with the usage or attitude 
towards plastic bags. 


Sometimes, especially in self-administered questionnaires, one may ask some 
neutral questions at the beginning of the questionnaire to establish an involvement 
and rapport. For example, for a biofertilizer usage study, the following question 
was asked: 
Farming for you is a: 
noble profession 
ancestral profession 
profession like any other 
profession that is not money making 
any other 
Do we need to ask several questions instead of a single one? After 
deciding on the significance of the question, one needs to ascertain whether a 
single question will serve the purpose or should more than one question be asked. 
For example, in a TV serial study, one may give ten popular serials to be ranked 
as 1 to 10 in order of preference. Then the second question after the ranking 
question is: 


‘Why do you like the serial (the one you ranked No. 1/prefer 
watching most)?’ (Incorrect) 


Here, one lady might say, ‘Everyone in my family watches it’. While another 
might say, ‘It deals with the problems of living ina typical Indian joint family system’ 
and yet another might say, ‘My friend recommended it to me’. 


Thus, we need to ask her: 

“What do you like about Y 
‘Who all in your household watch the serial?’ 
and 


‘How did you first hear about the serial?’ (Correct) 


4. Motivating the respondent to answer 


The questionnaire should be designed in a manner that it involves the respondent 
and motivates him/her to give information. There are different situations which 
might lead to this. Each of these is examined separately here: 


Does the person have the required information? It has been found that the 
person has had no experience with the issue being studied. Look at the following 
question: 

How do you evaluate the negotiation skills module, viz., the communication 
and presentation skill module? (Incorrect) 


In this case it might be that the person has not undergone one or even both 
the modules, so how can he compare? Thus, certain qualifying or filter questions 
must be asked. Filter questions enable the researcher to filter out the respondents 
who are not adequately informed. Thus, the correct question would have been: 

Have you been through the following training modules? 
e Negotiation skills module Yes/no 
e Communication and presentation skills | Yes/no 


In case the answer to both is yes, please answer the following question, or 
else move to the next question. 

How do you evaluate the negotiation skills module, viz., the communication 
and presentation skill module? (Correct) 


Does the person remember? Many a times, the question addressed might be 
putting too much stress on an individual’s memory. For example, consider the 
following questions: 
How much did you spend on eating out last month? (Incorrect) 
Such questions are beyond any normal individual’s memory bank. Thus, the 
questions listed above could have been rephrased as follows: 
When you go out to eat, on an average your bill amount is: 
Less than 7100 
7101-250 
%25 1-500 
More than 7500 
How often do you eat out in a week? 
1-2 times 
3—4 times 
5—6 times 
Every day (correct) 
Can the respondent articulate? Sometimes the respondent might not know how 
to put the answer in clear words. For example, if you ask a respondent to: 


e Describe a river rafting experience. 
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Questionnaire Design Most respondents would not know what phrases to use to give an answer. Thus, 
in the above case, one can provide answer categories to the person as follows: 


Describe the river rafting experience. (Correct) 


NOTES | 1 | Unexciting | Exciting | 
[2 | Ba | Good _ k| 


[3 | Boring | | Interesting | 


4 Cheap Expensive 


5 Safe Dangerous 


Sensitive information: There might be instances when the question being asked 
might be embarrassing to the respondents and thus they would not be comfortable 
in disclosing the data required. 


For example, questions such as the following will not get any answers. 


Have you ever used fake receipts to claim your medical allowance? 
(Incorrect) 


Have you ever spit tobacco on the road (to tobacco consumers)? 
(Incorrect) 


However, in case the socially undesirable habit is in the context ofa third person, 
the chances of getting some correct responses are possible. Thus the questions 
should be rephrased as follows: 


Do you associate with people who use fake receipts to claim their 
medical allowance? (Correct) 


Do you think tobacco consumers spit tobacco on the road? (Correct) 


5. Determining the type of questions 


Available to the researcher are different kinds of question-response options (Figure 


Question 
Content 


6.2) 


w 


Open-ended Closed-ended 


Fig. 6.2 Types of Question—Response Options 
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Open-ended questions Questionnaire Design 


In open-ended questions, the openness refers to the option of answering in one’s 
own words. They are also referred to as unstructured questions or free-response 
or free-answer questions. Some illustrations of this type are listed below: NOTES 


e What is your age? 
e Which is your favourite TV serial? 


e | like Nescafe because 


e My career goal is to 


Closed-ended questions 


In closed-ended questions, both the question and response formats are structured 
and defined. There are three kinds of formats as we observed earlier—dichotomous 
questions, multiple-choice questions and those that have a scaled response. 


i. Dichotomous questions: These are restrictive alternatives and provide the 
respondents only with two answers. These could be ‘yes’ or ‘no’, like or dislike, 
similar or different, married or unmarried, etc. 


Are you diabetic? Yes/No 
Have you read the new book by Dan Brown? Yes/no 
What kind of petrol do you use in your car? Normal/Premium 


Dichotomous questions are the easiest type of questions to code and analyse. 
They are based on the nominal level of measurement and are categorical or binary 
in nature. 


ii. Multiple-choice questions: Unlike dichotomous questions, the person is given 
a number of response alternatives here. He might be asked to choose the one that 
is most applicable. For example, this question was given to a retailer who is currently 
not selling organic food products: 


Will you consider selling organic food products in your store? 
e Definitely notin the nextone year = ¢ Probably not in the next one year 
e Undecided e Probably in the next one year 
e Definitely in the next one year 


Sometimes, multiple-choice questions do not have verbal but rather numerical 
options for the respondent to choose from, for example: 


How much do you spend on grocery products (average in one month)? 
Less than ¥2500/- 
Between ¥2500-5000/- 
More than %5000/- 
Most multiple-choice questions are based upon ordinal or interval level of 


measurement. There could also be instances when multiple options are given to! cuir mstructional 
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the respondent and he can select all those that apply in the case. These kinds of 
multiple-choice questions are called checklists. For example, in the organic food 
study, the retailer who does not stock organic products was given multiple reasons 
as follows: 


You do not currently sell organic food products because (Could be = 1) 
e You do not know about organic food products. 
e You are not interested. 
e Organic products do not have attractive packaging. 
e Organic food products are not supplied regularly. 
e Any other 


iii. Scales: Scales refer to the attitudinal scales that were discussed in detail in 
Unit 5. Since these questions have been discussed in detail in the earlier unit, we 
will only illustrate this with an example. The following is a question which has two 
sub-questions designed on the Likert scale. These require simple agreement and 
disagreement on the part of the respondent. This scale is based on the interval 
level of measurement. 


Given below are statements related to your organization. Please indicate your 
agreement/disagreement with each: 


(1-Strongly Disagree — — — — 5 Strongly Agree) Hre 


1. The people in my company know their roles very clearly. ie ae E T 
2. | want to complete my current task by hook or by crook. REZAT 


6. Criteria for question designing 


Step six ofthe questionnaire involves translating the questions identified into 
meaningful questions. There are certain designing criteria that a researcher should 
keep in mind when writing the research questions. 


Clearly specify the issue: By reading the question, the person should be able to 
clearly understand the information need. 


Which newspaper do you read? (Incorrect) 


This might seem to be a well-defined and structured question. However, 
the ‘you’ could be the person filling the questionnaire or the family. He could be 
reading different newspapers. He might be reading different papers at home and 
may be the college library. A better way to word the question would be: 


Which newspaper or newspapers did you personally read at home during 
the last month? In case of more than one newspaper, please list all that you read. 
(Correct) 


Use simple terminology: The researcher must take care to ask questions in a 
language that is understood by the population under study. Technical words or 
difficult words that are not used in everyday communication must be avoided. 


Do you think thermal wear provides immunity? (Incorrect) 


Do you think that thermal wear provides you protection from the cold? 
(Correct) 


Avoid ambiguity in questioning: The words used in the questionnaire should 
mean the same thing to all those answering the questionnaire. A lot of words are 
subjective and relative in meaning. Consider the following question: 


How often do you visit Pizza Hut? 
Never 
Occasionally 
Sometimes 
Often 
Regularly (Incorrect) 


These are ambiguous measures, as occasionally in the above question, 
might be three to four times in a week for one person it, while for another it could 
be three times in a month. A much better wording for this question would be the 
following: 


Ina typical month, how often do you visit Pizza Hut? 
Less than once 
1 or 2 times 
3 or 4 times 
More than 4 times (Correct) 


Avoid leading questions: Any question that provides a clue to the respondents 
in terms of the direction in which one wants them to answer is called a leading or 
biasing question. For example, ‘Do you think that working mothers should buy 
ready-to-eat food when that might contain some chemical preservatives? 


Yes 
No 
Don’t know (Incorrect) 


The question would mostly generate a negative answer, as no working mother 
would like to buy something that is convenient but might be harmful. Thus, it is 
advisable to construct a neutral question as follows: 


Do you think that working mothers should buy ready-to-eat food? 
Yes 
No 
Don’t know (Correct) 


Avoid loaded questions: Questions that address sensitive issues are termed as 
loaded questions and the response to these questions might not always be honest, 
as the person might not wish to admit the answer. For example, questions such as 
follows will rarely get an affirmative answer: 
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Will you take dowry when you get married? (Incorrect) 


Sensitive questions like this can be rephrased in a variety of ways. For example, 
the question could be constructed in the context ofa third person as follows: 


Do you think most Indian men would take dowry when they get married? 
(Correct) 


Avoid double-barrelled questions: Questions that have two separate options 
separated by an ‘or’ or ‘and’ like the following: 


Do you think Nokia and Samsung have a wide variety of touch 
phones? Yes/No (Incorrect) 


The problem is that the respondent might believe that Nokia has better 
phones or Samsung has better phones or both. These questions are referred to as 
double-barrelled and the researcher should always split them into two separate 
questions. For example, 


A wide variety of touch phones is available for: 
Nokia 
Samsung 
Both (Correct) 


7. Determine the questionnaire structure 


The questions now have to be put together in a proper sequence. 


Instructions: The questionnaires always, even the schedules, begin with 
standardized instructions. These begin by greeting the respondent and then 
introducing the researcher and then the purpose of questionnaire administration. 
For example, in the study on organic food products, the following instructions 
were given at the beginning of the questionnaire: 


‘Hi. We are carrying out a market research on the purchase 
behaviour of grocery products/organic food. We are conducting a survey of 
consumers, retailers and experts in the NCR for the same. 


As you are involved in the purchase and/or consumption of food products, 
we seek your cooperation for providing the following relevant information for our 
research. Thank you very much.’ 


Opening questions: A fter instructions come the opening questions, which lead 
the reader into the study topic. For example, a questionnaire on understanding the 
consumer’s buying behavior in malls can ask an opening question that is generic in 
nature, such as: 


What is your opinion about shopping at a mall? 


Study questions: After the opening questions, the bulk of the instrument needs to 
be devoted to the main questions that are related to the specific information needs 


of the study. Here also, the general rule is that the simpler questions, which do not 
require a lot of thinking or response time should be asked first as they build the 
tempo for answering the more difficult/sensitive questions later on. This method of 
going in a sequential manner from the general to the specific is called the funnel 
approach. 


Classification information: This is the information that is related to the basic 
socio-economic and demographic traits of the person. These might include name 
(kept optional in some cases), address, e-mail address and telephone number. 


Acknowledgement: The questionnaire ends by acknowledging the inputs of the 
respondent and thanking him for his cooperation and valuable contribution. 


8. Physical characteristics of the questionnaire 


The researcher must pay special attention to the look of the questionnaire. The 
first thing is the quality of the paper on which the questionnaire is printed which 
should be of good quality. The font style and spacing used in the entire document 
should be uniform. One must ensure that every question and its response options 
are printed on the same page. Surveys for different groups could be on different 
coloured paper. For example, if Delhi is being studied as five zones, then the 
questionnaire used in each zone could be printed on a differently coloured paper. 
Each question and section must be numbered properly. In case there is any response 
instruction for an individual question, it must be before the question. In case the 
questionnaire is going to be administered by the investigator and if there are any 
probing question then they should be clearly written as instructions for the 
investigator. 


9. Pilot testing of the questionnaire 


Pilot testing refers to testing and administering the designed instrument on a small 
group of people from the population under study. This is to essentially cover any 
errors that might have still remained even after the earlier eight steps. For example 
the question wording may not be clear, the sequence of questions may not be 
correct or the question is not needed as it does not solve any purpose. Thus these 
aspects need to be corrected.Every aspect of the questionnaire has to be tested 
and one must record all the experiences of the conduction, including the time 
taken to administer it. Sometimes, the researcher might also get the questionnaire 
whetted by academic or industry experts for their inputs. As far as possible, the 
pilot should be a small scale replica of the actual survey that would be subsequently 
conducted. 


10. Administering the questionnaire 


Once all the nine steps have been completed, the final instrument is ready for 
conduction and the questionnaire needs to be administered according to the 
sampling plan. 
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6.4 ADVANTAGES AND DISADVANTAGES OF THE 
QUESTIONNAIRE METHOD 


NOTES The questionnaire has many advantages over the other data collection methods 
discussed earlier. 


e Probably the greatest benefit of the method is its adaptability. There is, 
actually speaking, no domain or branch for which a questionnaire cannot 
be designed. It can be shaped in a manner that can be easily understood by 
the population under study. The language, the content and the manner of 
questioning can be modified suitably. The instrument is particularly suitable 
for studies that are trying to establish the reasons for certain occurrences or 
behaviour. 


The second advantage is that it assures anonymity if it is self-administered 
by the respondent, as there is no pressure or embarrassment in revealing 
sensitive data. A lot of questionnaires do not even require the person to fill 
in his/her name. Administering the questionnaire is much faster and less 
expensive as compared to other primary and a few secondary sources as 
well. There is considerable ease of quantitative coding and analysis of the 
obtained information as most response categories are closed-ended and 
based on the measurement levels as discussed in Unit 5. The chance of 
researcher bias is very little here. 


Lastly, there is no pressure of immediate response, thus the subject can fill 
in the questionnaire whenever he or she wants. 


The questionnaire is the most economical method as it can be administered 
simultaneously to a number of respondents. Thus a large amount of data 
can be collected within a short time through a questionnaire. 


However, the method does not come without any disadvantages. 


e The major disadvantage is that the inexpensive standardized instrument has 
limited applicability, that is, it can be used only with those who can read and 
write. 


The questionnaire is an impersonal method and sometimes for a sensitive 
issue it may not reveal the actual reasons or answers to the questionsthat 
you asked. The return ratio, i.e., the number of people who return the duly 
filled in questionnaires are sometimes not even 50 per cent of the number of 
forms distributed. 


Skewed sample response could be another problem. This can occur in two 
cases; one, if the investigator distributes the same to his friends and 
acquaintances and second, because of the self-selection of the subjects. 
This means that the ones who fill in the questionnaire and return it might not 
be the representatives of the population at large. In case the person is not 
clear about a question, clarification with the researcher might not be possible. 
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Check Your Progress 


. What are some of the other names for open-ended questions? 
. Name the method in which sampling control is the highest. 
. ‘Do you sing and dance?’ is an example of which type of question? 


. State the greatest benefit of the questionnaire method. 


ANSWERS TO CHECK YOUR PROGRESS 
QUESTIONS 


1. The first and foremost requirement of a questionnaire is that the spelt-out 


research objectives must be converted into clear questions which will extract 
answers from the respondent. 


2. The non-formalized and concealed questionnaire is used if the objective of 


the research is to uncover socially unacceptable desires and subconscious 
and unconscious motivations. 


3. The categories of questionnaires on the basis of method of administration 


are schedule and self-administered questionnaire. 


4. The other names for open-ended questions are unstructured questions or 


free-response or free-answer questions. 


5. Schedule is the method in which sampling control is the highest. 


6. ‘Do you sing and dance?’ is an example ofa double-barreled question. 


7. The greatest benefit of the questionnaire method is its adaptability. 


6.6 


SUMMARY 


e The questionnaire is a research technique that consists ofa series of questions 
asked to respondents, in order to obtain statistically useful information about 
a given topic. 

e Itis one of the most cost-effective methods of collecting primary data, which 
has the advantage of flexibility of approach and can be successfully adapted 
for most research studies. 


e There are many different types of questionnaire available to the researcher. 


e Based on the structure, questioannaires can be categorized into unconcealed 
and formalized, concealed and formalized, unconcealed and non-formalized 
and concealed and non-formalized. 


e Based on the method of administration, the questionnaire could be in the 
form of a schedule or self-administered questionnaire. 
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6.7 KEY WORDS 


e Questionnaire: A research tool that consists of a series of questions asked 
NOTES to respondents, in order to obtain statistically useful information about a 
given topic. 
e Schedule: Questionnaire with a face-to-face interaction in which the 
interviewer reads out each question and makes a note of the respondent’s 
answers. 


e Dichotomous questions: Questions with restrictive alternatives that provide 
the respondents only with two answers. 


e Double-barrelled questions: Questions that have two separate options 
separated by an ‘or’ or ‘and’. 


6.8 SELF ASSESSMENT QUESTIONS AND 
EXERCISES 


Short-Answer Questions 


1. What is a questionnaire? What are the criteria of a sound questionnaire? 
2. Write short notes on the following: 
(a) Formalized and concealed questionnaire 
(b) Non-formalized and unconcealed questionnaire 
(c) Non-formalized and concealed questionnaire 
3. What are the different criteria for designing questions in a questionnaire? 
Long-Answer Questions 
1. What are the steps involved in the questionnaire design? Explain in detail 
the questionnaire design process. 


2. What are the advantages and disadvantages of the questionnaire method? 
Illustrate with suitable examples. 
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UNIT 7 SAMPLING 


Structure 
7.0 Introduction 
7.1 Objectives 
7.2 Sampling Concepts 
7.2.1 Sample vs Census 
7.2.2 Sampling vs Non-Sampling Error 
7.3 Sampling Design 
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7.0 INTRODUCTION 


In Unit 5, we discussed the concept of attitude measurement and scaling. In this 
unit, we will discuss an important aspect of research — sampling. Let us understand 
what is sampling and what role it plays in research. 


As we have discussed earlier, research objectives are generally translated 
into research questions that enable the researchers to identify the information needs. 
Once the information needs are specified, the sources of collecting the information 
are sought. Some of the information may be collected through secondary sources 
(published material), whereas the rest may be obtained through primary sources. 
The primary methods of collecting information could be the observation method, 
personal interview with questionnaire (which we learnt in previous unit), telephone 
surveys and mail surveys. Surveys are, therefore, useful in information collection, 
and their analysis plays a vital role in finding answers to research questions. Survey 
respondents should be selected using the appropriate procedures; otherwise the 
researchers may not be able to get the right information to solve the problem 
under investigation. This is done through sampling. 


In this unit, we will discuss in detail the concept of sampling, including sampling 
and non-sampling error, probability and non-probability sampling designs, as well 
as determination of sample size. 
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7.1 OBJECTIVES 


After going through this unit, you will be able to: 
e Explain the basic concepts of sampling. 
e Distinguish between sample and census. 
e Differentiate between a sampling and non-sampling error. 
e Describe the meaning of sampling design. 
e Explain different types of probability sampling designs. 
e Describe various types of non-probability sampling designs. 


e Estimate the sample size required while estimating the population mean and 
proportion. 


7.2 SAMPLING CONCEPTS 


The process of selecting the right individuals, objects or events for a study is 
known as sampling. Sampling involves the study of a small number of individuals, 
objects chosen from a larger group. Before we get into the details of various 
issues pertaining to sampling, it would be appropriate to discuss some of the 
sampling concepts. 


Population: Population refers to any group of people or objects that form the 
subject of study in a particular survey and are similar in one or more ways. For 
example, the number of full-time MBA students in a business school could form 
one population. If there are 200 such students, the population size would be 200. 
We may be interested in understanding their perceptions about business education. 
If, in an organization there are 1,000 engineers, out of which 350 are mechanical 
engineers and we are interested in examining the proportion of mechanical engineers 
who intend to leave the organization within six months, all the 350 mechanical 
engineers would form the population of interest. If the interest is in studying how 
the patients in a hospital are looked after, then all the patients of the hospital would 
fall under the category of population. 


Element: An element comprises a single member of the population. Out of the 
350 mechanical engineers mentioned above, each mechanical engineer would form 
an element of the population. 


Sampling frame: Sampling frame comprises all the elements ofa population with 
proper identification that is available to us for selection at any stage of sampling. 
For example, the list of registered voters in a constituency could form a sampling 
frame; the telephone directory; the number of students registered with a university; 
the attendance sheet of a particular class and the payroll of an organization are 
examples of sampling frames. When the population size is very large, it becomes 
virtually impossible to form a sampling frame. We know that the number of 


consumers of soft drinks is very large and, therefore, it becomes very difficult to 
form the sampling frame for the same. 


Sample: It is a subset of the population. It comprises only some elements of the 
population. If out of the 350 mechanical engineers employed in an organization, 
30 are surveyed regarding their intention to leave the organization in the next six 
months, these 30 members would constitute the sample. 


Sampling unit: A sampling unit is a single member of the sample. Ifa sample of 
50 students is taken from a population of 200 MBA students in a business school, 
then each of the 50 students is a sampling unit. 


Sampling: It is a process of selecting an adequate number of elements from the 
population so that the study of the sample will not only help in understanding the 
characteristics of the population but also enables us to generalize the results. We 
will see later that there are two types of sampling designs—probability sampling 
design and non-probability sampling design. 


Census (or complete enumeration): An examination of each and every element 
of the population is called census or complete enumeration. Census is an alternative 
to sampling. We will discuss the inherent advantages of sampling over a complete 
enumeration later. 


7.2.1 Sample vs Census 


Ina research study, we are generally interested in studying the characteristics of a 
population. Suppose there are 2 lakh households in a town, and we are interested 
in estimating the proportion of households that spend their summer vacations in a 
hill station. This information can be obtained by asking every household in that 
town. Ifall the households in a population are asked to provide information, such 
a survey is called a census. There is an alternative way of obtaining the same 
information, by choosing a subset ofall the two lakh households and asking them 
for the same information. This subset is called a sample. Based upon the information 
obtained from the sample, a generalization about the population characteristic could 
be made. However, that sample has to be representative of the population. For a 
sample to be representative of the population, the distribution of sampling units in 
the sample has to be in the same proportion as the elements in the population. For 
example, ifin a town there are 50, 35 and 15 per cent households in lower, middle 
and upper income groups, then a sample taken from this population should have 
the same proportions in for it to be representative. There are several advantages 
of sample over census. 


e Sample saves time and cost. Many times a decision-maker may not have 
too much of time to wait till all the information is available. Therefore, a 
sample could come to his rescue. 


e There are situations where a sample is the only option. When we want to 
estimate the average life of fluorescent bulbs, what is done is that they are 
burnt out completely. If we go for a complete enumeration there would not 
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be anything left for use. Another example could be testing the quality ofa 
photographic film. 


e The study of a sample instead of complete enumeration may, at times, 
produce more reliable results. This is because by studying a sample, fatigue 
is reduced and fewer errors occur while collecting the data, especially when 
a large number of elements are involved. 


A census is appropriate when the population size is small, e.g., the number 
of public sector banks in the country. Suppose the researcher is interested in 
collecting information from the top management of a bank regarding their views 
on the monetary policy announced by the Reserve Bank of India (RBD, in this 
case, a complete enumeration may be possible as the population size is not very 
large. 


7.2.2 Sampling vs Non-Sampling Error 


There are two types of error that may occur while we are trying to estimate the 
population parameters from the sample. These are called sampling and non-sampling 
errors. 


Sampling error: This error arises when a sample is not representative of the 
population. It is the difference between sample mean and population mean. The 
sampling error reduces with the increase in sample size as an increased sample 
may result in increasing the representativeness of the sample. 


Non-sampling error: This error arises not because a sample is not a representative 
of the population but because of other reasons. Some of these reasons are listed 
below: 


e The respondents when asked for information on a particular variable may 
not give the correct answers. Ifa person aged 48 is asked a question about 
his age, he may indicate the age to be 36, which may result in an error and 
in estimating the true value of the variable of interest. 


The error can arise while transferring the data from the questionnaire to the 
spreadsheet on the computer. 


There can be errors at the time of coding, tabulation and computation. 


If the population of the study is not properly defined, it could lead to errors. 


The chosen respondent may not be available to answer the questions or 
may refuse to be part of the study. 


Check Your Progress 


1. What is the subset of a population called? 


2. Define a sampling frame. 


7.3 SAMPLING DESIGN 


Sampling design refers to the process of selecting samples from a population. 
There are two types of sampling designs—probability sampling design and non- 
probability sampling design. Probability sampling designs are used in conclusive 
research. Ina probability sampling design, each and every element of the population 
has a known chance of being selected in the sample. The known chance does not 
mean equal chance. Simple random sampling is a special case of probability 
sampling design where every element of the population has both known and equal 
chance of being selected in the sample. 


Incase of non-probability sampling design, the elements of the population 
do not have any known chance of being selected in the sample. These sampling 
designs are used in exploratory research. 


7.3.1 Probability Sampling Design 


Under this, the following sampling designs would be covered—simple random 
sampling with replacement (SRSWR), simple random sampling without replacement 
(SRSWOR), systematic sampling and stratified random sampling. 


Simple random sampling with replacement (SRSWR) 


Under this scheme, a list of all the elements of the population from where the 
samples are to be drawn is prepared. If there are 1,000 elements in the population, 
we write the identification number or the name ofall the 1,000 elements on 1,000 
different slips. These are put in a box and shuffled properly. If there are 20 elements 
to be selected from the population, the simple random sampling procedure involves 
selecting a slip from the box and reading of the identification number. Once this is 
done, the chosen slip is put back to the box and again a slip is picked up and the 
identification number is read from that slip. This process continues till a sample of 
20 is selected. Please note that the first element is chosen with a probability of 1/ 
1,000. The second one is also selected with the same probability and so are all the 
subsequent elements of the population. 


Simple random sampling without replacement (SRSWOR) 


In case of simple random sample without replacement, the procedure is identical 
to what was explained in the case of simple random sampling with replacement. 
The only difference here is that the chosen slip is not placed back in the box. This 
way, the first unit would be selected with the probability of 1/1,000, second unit 
with the probability of 1/999, the third will be selected with a probability of 
1/998 and so on, till we select the required number of elements (in this case, 20) in 
our sample. 


The simple random sampling (with or without replacement) is not used in 
consumer research. This is because in a consumer research the population size is 
usually very large, which creates problems in the preparation of a sampling frame. 
For example, number of consumers of soft drinks, pizza, shampoo, soap, chocolate, 
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etc, is very large. However, these (SRSWR and SRSWOR) designs could be 
useful when the population size is very small, for example, the number of steel/ 
aluminum-producing companies in India and the number of banks in India. Since 
the population size is quite small, the preparation of a sampling frame does not 
create any problem. 


Another problem with these (SRSWR and SRSWOR) designs is that we 
may not get a representative sample using such a scheme. Consider an example of a 
locality having 10,000 households, out of which 5,000 belong to low-income group, 
3,500 belong to middle income group and the remaining 1,500 belong to high- 
income group. Suppose it is decided to take a sample of 100 households using the 
simple random sampling. The selected sample may not contain even a single household 
belonging to the high- and middle-income group and only the low-income households 
may get selected, thus, resulting in a non-representative sample. 


Systematic sampling 


Systematic sampling takes care of the limitation of the simple random sampling 
that the sample may not be a representative one. In this design, the entire population 
is arranged in a particular order. The order could be the calendar dates or the 
elements of a population arranged in an ascending or a descending order of the 
magnitude which may be assumed as random. List of subjects arranged in the 
alphabetical order could also be used and they are usually assumed to be random 
in order. Once this is done, the steps followed in the systematic sampling design 
are as follows: 


e First ofall, a sampling interval given by K = N/n is calculated, 
where N = the size of the population and n = the size of the sample. 


It is seen that the sampling interval K should be an integer. If it is not, it is 
rounded off to make it an integer. 


e A random number is selected from 1 to K. Let us call it C. 


e The first element to be selected from the ordered population would be C, 
the next element would be C+ K and the subsequent one would be C+ 2K 
and so on till a sample of size n is selected. 


This way we can get representation from all the classes in the population 
and overcome the limitations of the simple random sampling. To take an example, 
assume that there are 1,000 grocery shops in a small town. These shops could be 
arranged in an ascending order of their sales, with the first shop having the smallest 
sales and the last shop having the highest sales. If it is decided to take a sample of 
50 shops, then our sampling interval K will be equal to 1000 + 50 = 20. Now we 
select a random number from 1 to 20. Suppose the chosen number is 10. This 
means that the shop number 10 will be selected first and then shop number 10 + 
20 = 30 and the next one would be 10 + (2 x 20) = 50 and so on till all the 50 
shops are selected. This way we can get a representative sample in the sense that 
it will contain small, medium and large shops. 


It may be noted that in a systematic sampling the first unit of the sample is 
selected at random (probability sampling design) and having chosen this, we have 
no control over the subsequent units of sample (non-probability sampling). Because 
of this, this design at times is called mixed sampling. 


The main advantage of systematic sampling design is its simplicity. When 
sampling from a list of population arranged in a particular order, one can easily 
choose arandom start as described earlier. After having chosen a random start, 
every K „item can be selected instead of going for a simple random selection. This 
design is statistically more efficient than a simple random sampling, provided the 
condition of ordering of the population is satisfied. 


The use of systematic sampling is quite common as it is easy and cheap to 
select a systematic sample. In systematic sampling one does not have to jump 
back and forth all over the sampling frame wherever random number leads and 
neither does one have to check for duplication of elements as compared to simple 
random sampling. Another advantage ofa systematic sampling over simple random 
sampling is that one does not require a complete sampling frame to draw a systematic 
sample. The investigator may be instructed to interview every 10" customer entering 
a mall without a list ofall customers. 


Stratified random sampling 


Under this sampling design, the entire population (universe) is divided into strata 
(groups), which are mutually exclusive and collectively exhaustive. By mutually 
exclusive, it is meant that ifan element belongs to one stratum, it cannot belong to 
any other stratum. Strata are collectively exhaustive if all the elements of various 
strata put together completely cover all the elements of the population. The elements 
are selected using a simple random sampling independently from each group. 


There are two reasons for using a stratified random sampling rather than 
simple random sampling. One is that the researchers are often interested in obtaining 
data about the component parts of a universe. For example, the researcher may 
be interested in knowing the average monthly sales of cell phones in ‘large’, ‘medium’ 
and ‘small’ stores. In such a case, separate sampling from within each stratum 
would be called for. The second reason for using a stratified random sampling is 
that it is more efficient as compared to a simple random sampling. This is because 
dividing the population into various strata increases the representativness of the 
sampling as the elements of each stratum are homogeneous to each other. 


There are certain issues that may be of interest while setting up a stratified 
random sample. These are: 


e What criteria should be used for stratifying the universe 
(population)? 


The criteria for stratification should be related to the objectives of the 
study. The entire population should be stratified in such a way that the 
elements are homogeneous within the strata, whereas there should be 


Sampling 


NOTES 


Self-Instructional 
Material 


119 


Sampling heterogeneity between strata. As an example, if the interest is to estimate 
the expenditure of households on entertainment, the appropriate criteria 
for stratification would be the household income. This is because the 


expenditure on entertainment and household income are highly correlated. 


NOTES Generally, stratification is done on the basis of demographic variables 


like age, income, education and gender. Customers are usually stratified 
on the basis of life stages and income levels to study their buying patterns. 
Companies may be stratified according to size, industry, profits for 
analysing the stock market reactions. 


e How many strata should be constructed? 


Going by common sense, as many strata as possible should be used so 
that the elements of each stratum will be as homogeneous as possible. 
However, it may not be practical to increase the number of strata and, 
therefore, the number may have to be limited. Too many strata may 
complicate the survey and make preparation and tabulation difficult. 
Costs of adding more strata may be more than the benefit obtained. 
Further, the researcher may end up with the practical difficulty of 
preparing a separate sampling frame as the simple random samples are 
to be drawn from each stratum. 


e What should be appropriate number of samples size to be taken in 
each stratum? 


This question pertains to the number of observations to be taken out 
from each stratum. At the outset, one needs to determine the total sample 
size for the universe and then allocate it between each stratum. This may 
be explained as follows: 


Let there be a population of size N. Let this population be divided into 
three strata based on a certain criterion. Let N,, N, and N, denote the 
size of strata 1, 2 and 3 respectively, such that N= N, +N, + N,. These 
strata are mutually exclusive and collectively exhaustive. Each of these 
three strata could be treated as three populations. Now, if a total sample 
of size n is to be taken from the population, the question arises that how 
much of the sample should be taken from strata 1, 2 and 3 respectively, 
so that the sum total of sample sizes from each strata adds up to n. 


Let the size of the sample from first, second and third strata be n, n,, 
and n, respectively such that n = n, + n, + n,. Then, there are two 
schemes that may be used to determine the values of n, (i= 1, 2, 3) 
from each strata. These are proportionate and disproportionate allocation 
schemes. 


Proportionate allocation scheme: In this scheme, the size of the sample in each 
stratum is proportional to the size of the population of the strata. For example, ifa 
bank wants to conduct a survey to understand the problems that its customers are 
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facing, it may be appropriate to divide them into three strata based upon the size 
of their deposits with the bank. If we have 10,000 customers of a bank in such a 
way that 1,500 of them are big account holders (having deposits of more than 10 
lakh), 3,500 of them are medium-sized account holders (having deposits of more 
than %2 lakh but less than 710 lakh), the remaining 5,000 are small account holders 
(having deposits of less than 2 lakh). Suppose the total budget for sampling is 
fixed at 720,000 and the cost of sampling a unit (customer) is 120. Ifa sample of 
100 is to be chosen from all the three strata, the size of the sample from strata 1 
would be: 
IN. oga (DOD: ae 


n,=nx—=10 — eS 
N 10000 


The size of sample from strata 2 would be: 


AE E = 


—— =35 
N 10000 


The size of sample from strata 3 would be: 


Ny _ 499-5000. _ 
N 10000 


n, =nx 50 


This way the size of the sample chosen from each stratum is proportional to 
the size of the stratum. Once we have determined the sample size from each stratum, 
one may use the simple random sampling or the systematic sampling or any other 
sampling design to take out samples from each of the strata. 


Disproportionate allocation: As per the proportionate allocation explained 
above, the sizes of the samples from strata 1, 2 and 3 are 15, 35 and 50 respectively. 
As it is known that the cost of sampling of a unit is 20 irrespective of the strata 
from where the sample is drawn, the bank would naturally be more interested in 
drawing a large sample from stratum 1, which has the big customers, as it gets 
most of its business from strata 1. In other words, the bank may follow a 
disproportionate allocation of sample as the importance of each stratum is not the 
same from the point of view of the bank. The bank may like to take a sample of 45 
from strata 1 and 40 and 15 from strata 2 and 3 respectively. Also, a large sample 
may be desired from the strata having more variability. 


7.3.2 Non-probability Sampling Designs 


Under the non-probability sampling, the following designs would be considered— 
convenience sampling, purposive (judgemental) sampling and snowball sampling. 


Convenience sampling 


Convenience sampling is used to obtain information quickly and inexpensively. 
The only criterion for selecting sampling units in this scheme is the convenience of 
the researcher or the investigator. Mostly, the convenience samples used are 
neighbours, friends, family members, colleagues and ‘passers-by’. This sampling 
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design is often used in the pre-test phase of a research study such as the pre- 
testing of a questionnaire. Some of the examples of convenience sampling are: 


e People interviewed in a shopping centre for their political opinion fora TV 
programme. 


e Monitoring the price level in a grocery shop with the objective of inferring 
the trends in inflation in the economy. 


e Requesting people to volunteer to test products. 
e Using students or employees of an organization for conducting an experiment. 


In all the above situations, the sampling unit may either be self-selected or 
selected because of ease of availability. No effort is made to choose a representative 
sample. Therefore, in this design the difference between the population value 
(parameters) of interest and the sample value (statistic) is unknown both in terms 
of the magnitude and direction. Therefore, it is not possible to make an estimate of 
the sampling error and researchers would not be able to make a conclusive 
statement about the results from such a sample. It is because of this, convenience 
sampling should not be used in conclusive research (descriptive and causal 
research). 


Convenience sampling is commonly used in exploratory research. This is 
because the purpose ofan exploratory research is to gain an insight into the problem 
and generate a set of hypotheses which could be tested with the help ofa conclusive 
research. When very little is known about a subject, a small-scale convenience 
sampling can be of use in the exploratory work to help understand the range of 
variability of responses in a subject area. 


Judgemental sampling 


Under judgemental sampling, experts in a particular field choose what they believe 
to be the best sample for the study in question. The judgement sampling calls for 
special efforts to locate and gain access to the individuals who have the required 
information. Here, the judgement of an expert is used to identify a representative 
sample. For example, the shoppers at a shopping centre may serve to represent 
the residents of a city or some of the cities may be selected to represent a country. 
Judgemental sampling design is used when the required information is possessed 
bya limited number/category of people. This approach may not empirically produce 
satisfactory results and, may, therefore, curtail generalizability of the findings due 
to the fact that we are using a sample of experts (respondents) that are usually 
conveniently available to us. Further, there is no objective way to evaluate the 
precision of the results. A company wanting to launch a new product may use 
judgemental sampling for selecting ‘experts’ who have prior knowledge or 
experience of similar products. A focus group of such experts may be conducted 
to get valuable insights. Opinion leaders who are knowledgeable are included in 


the organizational context. Enlightened opinions (views and knowledge) constitute 
arich data source. A very special effort is needed to locate and have access to 
individuals who possess the required information. 


The most common application of judgemental sampling is in business-to- 
business (B to B) marketing. Here, a very small sample of lead users, key accounts 
or technologically sophisticated firms or individuals is regularly used to test new 
product concepts, producing programmes, etc. 


Quota Sampling 


In quota sampling, the sample includes a minimum number from each specified 
subgroup in the population. The sample is selected on the basis of certain 
demographic characteristics such as age, gender, occupation, education, income, 
etc. The investigator is asked to choose a sample that conforms to these parameters. 
Field workers are assigned quotas of the sample to be selected satisfying these 
characteristics. We will discuss quoto sampling later on. 


Snowball sampling 


Snowball sampling is generally used when it is difficult to identify the members of 
the desired population, e.g., deep-sea divers, families with triplets, people using 
walking sticks, doctors specializing in a particular ailment, etc. Under this design 
each respondent, after being interviewed, is asked to identify one or more in the 
field. This could result in a very useful sample. The main problem is in making the 
initial contact. Once this is done, these cases identify more members of the 
population, who then identify further members and so on. It may be difficult to get 
a representative sample. One plausible reason for this could be that the initial 
respondents may identify other potential respondents who are similar to themselves. 
The next problem is to identify new cases. 

Snow-ball sampling is very suitable in studying social groups, informal groups 
in a formal organization and diffusion of information among professionals of various 
kinds. 


Advantage 


The main advantage of snow-ball sampling is that it is useful for smaller populations 
for which no frames are readily available. 


Disadvantages 

The following are the disadvantages of this method: 
e It does not allow the use of probability statistical methods. 
o Itis difficult to apply when the population is large. 


e It does not ensure the inclusions of all the elements in the list. 
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Other types of sampling 


Voluntary sample: A voluntary sample is made up of people who self- 
select into the survey. Often, these folks have a strong interest in the main 
topic of the survey. Suppose, for example, that a news show asks viewers 
to participate in an on-line poll. This would be a volunteer sample. The 
sample is chosen by the viewers, not by the survey administrator. 


Multistage sampling: With multistage sampling, we select a sample by 
using combinations of different sampling methods. For example, in Stage 1, 
we might use cluster sampling to choose clusters from a population. Then, 
in Stage 2, we might use simple random sampling to select a subset of 
elements from each chosen cluster for the final sample. 


Replicated or interpenetrating sampling: Replicated sampling involves 
selection of a certain number of sub-samples rather than one full sample 
from a population. All the sub-samples should be drawn using the same 
sampling technique and each is a self-contained and adequate sample of 
the population. Replicated sampling technique can be used with any basic 
sampling technique: simple or stratified, single or multistage or single or 
multi-phase sampling. It provides a simple means of calculating the sampling 
error. It is practical. The replicated samples can throw light on variables 
and non-sampling errors. The only disadvantage is that it limits the amount 
of stratification that can be employed. 


Area sampling: Area sampling is also a form of cluster sampling. Ina large 
field survey cluster consisting of specific geographical areas like districts, 
tallukas, blocks, villages, in a city are randomly drawn. When geographical 
areas are selected as sampling units, their sampling is known as area sampling. 


Double sampling and multi-phase sampling: Double sampling refers to 
the subsection of the final sample form a preselected larger sample that 
provided information for improving the final selection. When the procedure 
is extended to more than two phases of selection, it is then called multi- 
phase sampling. This is also known as sequential sampling, as sub-sampling 
is done form a main sample in phases. Double sampling or multi-phase 
sampling is acompromise solution for a dilemma posed by undesirable 
extremes. The statistics based on the sample of ‘n’ can be improved by 
using ancillary information from a wide base, but this is too costly to obtain 
from the entire population of N elements. Instead, information is obtained 
from a larger preliminary sample which includes the final sample n. 


Quota Sampling 


As discussed, quota sampling is a form of convenient sampling which involves 


selection of quota groups of accessible sampling units by traits such as sex, age, 


social class, etc. It is amethod of stratified sampling in which the selection within Sampling 
strata is known as random. It is this known random element that constitutes its 
greatest weakness. 


This sampling method is used in studies like marketing service, opinion polls 
i ; j ; l st NOTES 
and readership service which do not aim at making decisions, but try to get some 
crude results quickly. 


Advantages 


The following are the advantages of quota sampling: 
o Itis less costly. 
e It takes less time. 
e There is no need fora list of the population. 
e The field work can easily be organized. 
Disadvantages 
The following are the disadvantages of quota sampling: 
e Itis impossible to estimate sampling error. 


e Strict control if field work is difficult. 


o Itis subject to a higher degree of classification. 


Check Your Progress 
3. Name the sampling in where every element of the population has both 
known and equal change of being selected in the sample. 


4. Which sampling design is often used in the pre-test phase of a research 
study as the pre-testing of a questionnaire? 


5. What is snowball sampling? 


7.4 DETERMINATION OF SAMPLE SIZE 


The size of a sample depends upon the basic characteristics of the population, the 
type of information required from the survey and the cost involved. Therefore, a 
sample may vary in size for several reasons. The size of the population does not 
influence the size of the sample as will be shown later on. 


There are various methods of determining the sample size in practice: 


e Researchers may arbitrarily decide the size of sample without giving any 
explicit consideration to the accuracy of the sample results or the cost of 
sampling. This arbitrary approach should be avoided. 
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Sampling e For some of the projects, the total budget for the field survey (usually 
mentioned) in a project proposal is allocated. Ifthe cost of sampling per 
sample unit is known, one can easily obtain the sample size by dividing the 
total budget allocation by the cost of sampling per unit. This method 
concentrates only on the cost aspect of sampling, rather than the value of 
information obtained from such a sample. 


NOTES 


e There are other researchers who decide on the sample size based on what 
was done by the other researchers in similar studies. Again, this approach 
cannot be a substitute for the formal scientific approach. 


e The most commonly used approach for determining the size of sample is 
the confidence interval approach covered under inferential statistics. Below 
will be discussed this approach while determining the size ofa sample for 
estimating population mean and population proportion. In a confidence 
interval approach, the following points are taken into account for determining 
the sample size in estimation of problems involving means: 


(a) The variability of the population: It would be seen that the higher the 
variability as measured by the population standard deviation, larger will 
be the size of the sample. If the standard deviation of the population is 
unknown, a researcher may use the estimates of the standard deviation 
from previous studies. Alternatively, the estimates of the population 
standard deviation can be computed from the sample data. 


(b) The confidence attached to the estimate: It is a matter of judgement, 
how much confidence you want to attach to your estimate. Assuming 
anormal distribution, the higher the confidence the researcher wants 
for the estimate, larger will be sample size. This is because the value 
of the standard normal ordinate ‘Z’ will vary accordingly. For a 90 
per cent confidence, the value of ‘Z’ would be 1.645 and for a 95 per 
cent confidence, the corresponding ‘Z’ value would be 1.96 and so 
on (see Table 7.1). It would be seen later that a higher confidence 
would lead to a larger ‘Z’ value. 


(c) The allowable error or margin of error: How accurate do we 
want our estimate to be is again a matter of judgement of the 
researcher. It will of course depend upon the objectives of the study 
and the consequence resulting from the higher inaccuracy. If the 
researcher seeks greater precision, the resulting sample size would 
be large. 
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Table 7.1 Area under Standard Normal Distribution between 
the Mean and Successive Value of Z 


0.00 


0.01 


0.02 


0.03 


0.04 


0.05 


0.06 


0.07 


0.0000 


0.0040 


0.0080 


0.0120 


0.0160 


0.0199 


0.0239 


0.0279 


0.0398 


0.0438 


0.0478 


0.0517 


0.0557 


0.0596 


0.0636 


0.0675 


0.0793 


0.0832 


0.0871 


0.0910 


0.0948 


0.0987 


0.1026 


0.1064 


0.1179 


0.1217 


0.1255 


0.1293 


0.1331 


0.1368 


0.1406 


0.1443 


0.1554 


0.1591 


0.1628 


0.1664 


0.1700 


0.1736 


0.1772 


0.1808 


0.1915 


0.1950 


0.1985 


0.2019 


0.2054 


0.2088 


0.2123 


0.2157 


0.2257 


0.2291 


0.2324 


0.2357 


0.2389 


0.2422 


0.2454 


0.2486 


0.2580 


0.2611 


0.2642 


0.2673 


0.2704 


0.2734 


0.2764 


0.2794 


0.2881 


0.2910 


0.2939 


0.2967 


0.2995 


0.3023 


0.3051 


0.3078 


0.3159 


0.3186 


0.3212 


0.3238 


0.3264 


0.3289 


0.3315 


0.3340 


0.3413 


0.3438 


0.3461 


0.3485 


0.3508 


0.3531 


0.3554 


0.3577 


0.3643 


0.3665 


0.3686 


0.3708 


0.3729 


0.3749 


0.3770 


0.3790 


1.2 


0.3849 


0.3869 


0.3888 


0.3907 


0.3925 


0.3944 


0.3962 


0.3980 


1.3 
1.4 


0.4032 
0.4192 


0.4049 
0.4207 


0.4066 
0.4222 


0.4082 
0.4236 


0.4099 
0.4251 


0.4115 
0.4265 


0.4131 
0.4279 


0.4147 
0.4292 


1.5 


0.4332 


0.4345 


0.4357 


0.4370 


0.4382 


0.4394 


0.4406 


0.4418 


1.6 


0.4452 


0.4463 


0.4474 


0.4484 


0.4495 


0.4505 


0.4515 


0.4525 


1.7 


0.4554 


0.4564 


0.4573 


0.4582 


0.4591 


0.4599 


0.4608 


0.4616 


1.8 


0.4641 


0.4649 


0.4656 


0.4664 


0.4671 


0.4678 


0.4686 


0.4693 


1.9 


0.4713 


0.4719 


0.4726 


0.4732 


0.4738 


0.4744 


0.4750 


0.4756 


2.0 


0.4772 


0.4778 


0.4783 


0.4788 


0.4793 


0.4798 


0.4803 


0.4808 


2.1 


0.4821 


0.4826 


0.4830 


0.4834 


0.4838 


0.4842 


0.4846 


0.4850 


2.2 


0.4861 


0.4864 


0.4868 


0.4871 


0.4875 


0.4878 


0.4881 


0.4884 


2.3 


0.4893 


0.4896 


0.4898 


0.4901 


0.4804 


0.4906 


0.4909 


0.4911 


2.4 
2.5 


0.4918 
0.4938 


0.4920 
0.4940 


0.4922 
0.4941 


0.4925 
0.4943 


0.4927 
0.4945 


0.4929 
0.4946 


0.4931 
0.4948 


0.4932 
0.4949 


2.6 


0.4953 


0.4955 


0.4956 


0.4957 


0.4959 


0.4960 


0.4961 


0.4962 


2.7 


0.4965 


0.4966 


0.4967 


0.4968 


0.4969 


0.4970 


0.4971 


0.4972 


2.8 


0.4974 


0.4975 


0.4976 


0.4977 


0.4977 


0.4978 


0.4979 


0.4979 


2.9 


0.4981 


0.4982 


0.4982 


0.4983 


0.4984 


0.4984 


0.4985 


0.4985 


3.0 


0.4987 


0.4987 


0.4987 


0.4988 


0.4988 


0.4989 


0.4989 


0.4989 


7.4.1 Sample Size for Estimating Population Mean 


The formula for determining the sample size in such a case is given by 


n 


Zo? 
e 
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Where X —p = e=Margin of error 
n = Sample size 
o = Population standard deviation 
Z = the value of standard normal ordinate 


It may be noted from above that the size of the sample is directly proportional 
to the variability in the population and the value of Z for a confidence interval. It 
varies inversely with the size of the error. It may also be noted that the size ofa 
sample does not depend upon the size of population. 

Below is given a worked out example for the determination of a sample 
size. 

Example 7.1: An economist is interested in estimating the average monthly 
household expenditure on food items by the households of a town. Based on past 
data, it is estimated that the standard deviation of the population on the monthly 
expenditure on food item is ]30. With allowable error set at ¥7, estimate the 
sample size required at a 90 per cent confidence. 

Solution: 


90 percent confidence => Z = 1.645 


e =? 

o = %30 
Zo? 

n=- 


(1.645) (307 
(7% 

= 49.7025 

= 50 (approx.) 


7.4.2 Determination of Sample Size for Estimating the Population 
Proportion 


The formula for determining the sample size in such a case is given by 


Z° pq 
n= —; 
e 
The above formula will be used if the value of population proportion 
(proportion ofoccurrence ofthe event) p is known. If, however, p is unknown, 
we substitute the maximum value of pq in the above formula. It can be shown that 


the maximum value of pq is 1/4 when p = 1/2 and q = 1/2. 


Therefore, n= -z 


Let us consider two examples for determining a sample size while estimating 
the population proportion. 


Example 7.2: A manager of a department store would like to study women’s 
spending per year on cosmetics. He is interested in knowing the population 
proportion of women who purchase their cosmetics primarily from his store. Ifhe 
wants to have a 90 per cent confidence of estimating the true proportion to be 
within + 0.045, what sample size is needed? 


Solution: 
90 percent confidence => Z = 1.645 
e = + .045 


© 1(1.645) 
~ 4 (0.45) 


334.0772 
335 (approx.) 


II 


Example 7.3: A consumer electronics company wants to determine the job 
satisfaction levels ofits employees. For this, they ask a simple question, ‘Are you 
satisfied with your job?’ It was estimated that no more than 30 per cent of the 
employees would answer yes. What should be the sample size for this company to 
estimate the population proportion to ensure a 95 per cent confidence in result, 
and to be within 0.04 of the true population proportion? 


Solution: 
95 per cent confidence > Z=1.96 


e = 0.04 
p = 0.3 
q = 0.7 
Zz’ pq 
n = = 
e 


(1.96)? x0.3 x 0.7 
(0.04) 


= 504.21 
= 505 (approx.) 


Points to be noted for sample size determination 


There are certain issues to be kept in mind before applying the formulas for the 
determination of sample size in this unit. First of all, these formulas are applicable 
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for simple random sampling only. Further, they relate to the sample size needed 
for the estimation of a particular characteristic of interest. In a survey, a researcher 
needs to estimate several characteristics of interests and each one of them may 
require a different sample size. In case the universe is divided into different strata, 
the accuracy required for determining the sample size for each strata may be 
different. However, the present method will not able to serve the requirement. 
Lastly, the formulas for sample size must be based upon adequate information 
about the universe. 


6. What are the factors on which is the size of a sample depends? 


7. Mention the most commonly used approach for determining the size of 


Check Your Progress 


sample. 


7.5 


ANSWERS TO CHECK YOUR PROGRESS 
QUESTIONS 


1. Sample is the subset of a population. 


7.6 


. Sampling frame comprises of all the elements of a population with proper 


identification that is available to us for selection at any stage of sampling. 


. Simple random sampling the special case of probability sampling design 


where every element of the population has both known and equal change 
of being selected in the sample. 


. Convenience sampling is the sampling design which is often used in the pre- 


test phase of a research study as the pre-testing of a questionnaire. 


. Snowball sampling is generally used when it is difficult to identify the members 


of the desired population. Under this design each respondent, after being 
interviewed, is asked to identify one or more in the field. 


. The size ofa sample depends upon the basic characteristics of the population, 


the type of information required from the survey and the cost involved. 


. The most commonly used approach for determining the size of the sample 


is the confidence interval approach. 


SUMMARY 


Surveys are useful in information collection. The survey respondents should 
be selected using appropriate and right procedures. The process of selecting 
the right individuals, objects or events for the study is known as sampling. 


e An alternative to sample is census where each and every element of the Sampling 
population (universe) is examined. There are many advantages of sampling 
over complete enumeration. While estimating the population parameter using 
sample results, the researcher may incur two types of error—sampling and 
non-sampling error. NOTES 


e The process of selecting samples from the population is referred to as 
sampling design. There are two types of sampling designs—probability 
sampling design and non-probability sampling design. Probability sampling 
designs are used in a conclusive research whereas non-probability sampling 
designs are appropriate for an exploratory research. 


There are four probability sampling designs—the simple random sampling 
with replacement, simple random sampling without replacement, systematic 
sampling and stratified random sampling. 


e Under the non-probability sampling designs, there are convenience sampling, 
judgmental sampling and snowball sampling. 


7.7 KEY WORDS 


e Convenience sampling: The type of sampling in which the sample is 
selected as per the convenience of the investigator. 


Census: The enumeration of each and every element of population. 


Element: A single member of population. 
e Sampling design: The process of selecting samples from a population. 


e Sampling error: The error that occurs because of non-representativeness 
of the sample. 


7.8 SELF ASSESSMENT QUESTIONS AND 
EXERCISES 


Short-Answer Questions 


1. Differentiate between sample and census. 

2. Differentiate between the stratified random sampling and systematic sampling. 

3. Why is judgemental sampling used in research? Can it result in more 
representative sample than a random sample? 


Long-Answer Questions 


1. Explain the various sources of non-sampling errors. 


2. Explain the difference between simple random sampling with replacement 
and without replacement. 
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Sampling 3. Explain giving example why a random sample may not result into a 
representative sample. 
4. Explain the factors that should be considered while selecting a sample for 
research. 
NOTES 
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UNIT 8 DATA PROCESSING 


Structure 
8.0 Introduction 
8.1 Objectives 
8.2 Data Editing 
8.2.1 Field Editing 
8.2.2 Centralized In-house Editing 
8.3 Coding 
8.3.1 Coding Closed-ended Structured Questions 
8.3.2 Coding Open-ended Structured Questions 
8.4 Classification and Tabulation of Data 
8.5 Answers to Check Your Progress Questions 
8.6 Summary 
8.7 Key Words 
8.8 Self Assessment Questions and Exercises 
8.9 Further Readings 


8.0 INTRODUCTION 


In the last few units, you have learnt about the various aspects of data collection. 
The critical job of the researcher begins after the data has been collected. He has 
to use this information to assess whether he had been correct or incorrect while 
making certain assumptions in the form of the hypotheses at the beginning of the 
study. The raw data that has been collected must be refined and structured in such 
a format that it can lend itself to statistical enquiry. This process of preparing the 
data for an analysis is a structured and sequential process. The process starts by 
validating the measuring instrument, which could be a questionnaire or any other 
primary technique. This is followed by editing, coding, classifying and tabulating 
the obtained data. 


In this unit we will learn these steps of preparing the data through editing, 
coding and tabulating, so that it is ready for any kind of statistical analysis, in order 
to achieve the research objectives we had made earlier. 


8.1 OBJECTIVES 


After going through this unit, you will be able to: 
e Explain the significance and technique of data processing 


e Construct codes both for structured and unstructured questionnaires 
following certain guidelines. 


e Classifiy and tabulate data in the required format. 


Data Processing 
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8.2 


DATA EDITING 


Data editing is the process that involves detecting and correcting errors (logical 

NOTES inconsistencies) in data. After collection, the data is subjected to processing. 
Processing requires that the researcher must go over all the raw data forms and 
check them for errors. The significance of validation becomes more important in 
the following cases: 


In case the form had been translated into another language, expert analysis 
is done to see whether the meaning of the questions in the two measures is 
the same or not. 


The second case could be that the questionnaire survey has to be done at 
multiple locations and it has been outsourced to an outside research agency. 


The respondent seems to have used the same response category for all the 
questions; for example, there is a tendency on a five point scale to give 3 as 
the answer for all questions. 


The form that is received back is incomplete, in the sense that either the 
person has not filled the answer to all questions, or in case of a multiple- 
page questionnaire, one or more pages are missing. 


The forms received are not in the proportion of the sampling plan. For 
example, instead of an equal representation from government and private 
sector employees, 65 per cent of the forms are from the government sector. 
In such a case the researcher either would need to discard the extra forms 
or get an equal number filled-in from private sector employees. 


Once the validation process has been completed, the next step is the editing 


of the raw data obtained. While carrying out the editing the researcher needs to 
ensure that: 


e The data obtained is complete in all respects. 
e Itis accurate in terms of information recorded and responses sought. 


e Questionnaires are legible and are correctly deciphered, especially the 
open-ended questions. 


e The response format is in the form that was instructed. 


e The data is structured in a manner that entering the information will not 
be aproblem. 


The editing process is carried out at two levels, the first of these is field 


editing and the second is central editing. 


8.2.1 


Field Editing 


Usually, the preliminary editing of the information obtained is done by the field 
investigators or supervisors who review the filled forms for any inconsistencies, 
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non-response, illegible responses or incomplete questionnaires. Thus the errors 
can be corrected immediately and if need be the respondent who filled in the form, 
can be contacted again. The other advantage is that regular field editing ensures 
that one can also check that the surveyor is able to handle the process of instructions 
and probing correctly or not. Thus, the researcher can advise and train the 
investigator on how to administer the questionnaire correctly. 


8.2.2 Centralized in-house Editing 


The second level of editing takes place at the researcher’s end. At this stage there 
are two kinds of typical problems that the researcher might encounter. 


First, one might detect an incorrect entry. For example, in case of a five- 
point scale one might find that someone has used a value more than 5. In another 
case, one might be asking a question like, ‘how many days do you travel out of the 
city ina week?’ and the person says ‘15 days’. Here one can carry out a quick 
frequency check of the responses; this will immediately detect an unexpected value. 


The second and the major problem that most researchers face is that of 
‘armchair interviewing’ or a fudged interview. One way to handle this is to first 
scroll the answers to the open-ended questions, as generally if the investigator is 
filling in multiple forms faking these would be difficult. 


The researcher has some standard processes available to him to carry out the 
editing process. These are briefly discussed below. 


Backtracking: The best and the most efficient way of handling unsatisfactory 
responses is to return to the field, and go back to the respondents. This technique 
is best used for industrial surveys but a little difficult in individual surveys. 


Allocating missing values: This is a contingency plan that the researcher might 
need to adopt in case going back to the field is not possible. Then the option might 
be to assign a missing value to the blanks or the unsatisfactory responses. However, 
this works in case: 


e The number of blank or wrong answers is small. 
e The number of such responses per person is small. 


e The important parameters being studied do not have too many blanks, 
otherwise the sample size for those variables becomes too small for 
generalizations. 


Plug value: In cases such as the third condition above, when the variable being 
studied is the key variable, then sometimes the researcher might insert a plug 
value. Sometimes one can plug an average or a neutral value in such cases, for 
example a 3 for a five-point scale or the researcher might have to establish a rule 
as to what value will be put if the person has not answered. Sometimes, the 
respondents’ pattern of responses to other questions is used to extrapolate and 
calculate an appropriate response for the missing answer. 
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Discarding unsatisfactory responses: If the response sheet has too many 
blanks/illegible or multiple responses for a single answer, the form is not worth 
correcting and editing. Hence, it is much better to completely discard the whole 
questionnaire. 


Check Your Progress 


1. What are the levels at which the editing of the raw data takes place? 


2. Mention one way of handling the problem of ‘armchair interviewing’ or a 
fudged interview. 


8.3 CODING 


The process of identifying and denoting a numeral to the responses given by a 
respondent is called coding. This is essentially done in order to help the researcher’s 
in recording the data in a tabular form later. It is advisable to assign a numeric 
code even for the categorical data (e.g., gender). In fact, even for open-ended 
questions, which are in a statement form, we will try to categorize them into numbers. 
The reason for doing this is that the graphic representation of data into charts and 
figures becomes easier. 


Usually, the codes that have been formulated are organized into fields, records 
and files. For example, the gender of a person is one field and the codes used 
could be 0 for males and 1 for females. All related fields, for example, all the 
demographic variables like age, gender, income, marital status and education could 
be one record. The records of the entire sample under study form a single file. 
The data that is entered in the spreadsheet, such as on EXCEL, is in the form ofa 
data matrix, which is simply a rectangular arrangement of the data in rows and 
columns. Here, every row represents a single case or record. For example, consider 
the following representation from a study on two-wheeler buyers (Table 8.1): 


Table 8.1 Sample Record: Excel Sheet for Two-wheeler Owners 


Unit Occupation Vehicle Km/day | Marital status} Family size 
Column 1 Column 2 Column 3 Column 4 conn 5 somun 6 


l ee ee 


2 a Se ee 

| 2? | %2% | 
Here, the data matrix reveals that each field is denoted on the column head 
and each case record is to be read along the row. The data in the first column 
represents the unique identification given to a particular respondent (also marked 
on his/her questionnaire). The second column has data entered on the basis ofa 
coding scheme where every occupation is given a number value (for example, 1 


stands for government service and 5 stands for student and so on). Column 3 has 
1 representing a motorcycle and 2 representing a scooter. The next value is of the 
average number of kilometres a person travels per day. 


This is followed by the marital status, with 1 signifying unmarried and 
2 married. The last column is again a ratio scale data with the number of family 
members. The researcher can enter the data on the spreadsheet of the software 
package he/she is using for the analysis. 


Codebook formulation: In order to manage the data entry process, it is best to 
prepare a method for entering the records. This coding scheme for all the variables 
under study is called a code book. Generally, while designing the rules, care must 
be taken to decide on some categories that are: 


e Comprehensive: Should cover all the possible answer to the question that 
was asked. 


e Mutually exclusive: The categories and codes devised must be exclusive 
or clearly different from each other. 


Single variable entry: The response that is being entered and the code 
for it should indicate only a single variable. For example, a ‘working single 
mother’ might seem an apparently simple category which one could code 
as ‘occupation’. However, it needs three columns—occupation, marital status 
and family life cycle. So, one needs to have three different codes to enter 
this information. 


Based on the above rules, one creates a code book. This would generally 
contain information on the question number, variable name, response descriptors 
and coding instructions and the column descriptor. 


As we have read in Unit 6, a questionnaire can have both closed-ended and 
open-ended questions. When the questions are structured and the response 
categories are prescribed then one does what is called pre-coding, i.e., giving 
numeral codes to the designed responses before administration. However, if the 
questions are structured and the answers are open ended, one needs to decide on 
the codes after the administration of the survey. This is called post-coding. 


8.3.1 Coding Closed-ended Structured Questions 


The method of coding for structured questions is easier as the response categories 
are decided in advance. The coding method to be followed for different kinds of 
questions is discussed below. 


Dichotomous questions: For dichotomous questions, which are on a nominal 
scale, the responses can be binary, for example: 


Do you eat ready-to-eat food? Yes = 1; no=0. 


This means if someone eats ready- to- eat food he/she will be given a score 
of 1 and if not, then 0. 
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Table 8.2 Codebook Extract for Ready-to-eat Food Study 


Question Variable Name Coding Instruction Symbol used 
No. for Variable 
Name 
NOTES 1. Buy ready-to-eat food Yes=1 X1 
products No=0 
2. Use ready-to-eat food Yes = 1 
products No =0 
22. Age Less than 20 years = 1, 
21-26 years = 2, 
27-35 years = 3, 
36—45 years = 4, 
More than 45 years = 5 
23. Gender Male = 1 X23 
Female = 2 
24. Marital status Single = 1 X24 
Married = 2 
Divorced/widow = 3 
25. No. of children Exact no. to be written X25 


Ranking questions: For ranking questions where there are multiple objects to 
be ranked, the person will have to make multiple columns, with column numbers 
equaling the number of objects to be ranked. For example, for ranking TV serials, 


the code book would be as follows: 


Q.No. 


Variable Name 
Balika Vadhu 
Sathiya 
Sasural Genda Phool 
Bidai 
Pathshala 


Coding Instructions 
Number from 1-10 
Number from 1-10 
Number from 1-10 
Number from 1-10 


Number from 1-10 


Variable Name 
X 10a 
X 10b 
X 10c 
X 10d 


X 10e 


Checklists/multiple responses: In questions that permit a large number of 
responses, each possible response option should be assigned a separate column. 
For example, consider the following question: 


Which ofthe following newspapers do you read? (Tick all that you read.) 
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Times of India 
Hindustan Times 
Mail Today 
Indian Express 
Deccan Chronicle 
Asian Age 

Mint 


For this question, the number of columns required is seven, one for each Data Processing 

newspaper. The coding instructions for each column would be as follows: in case 

the person ticks on a name, the paper = 1, and in case he does not tick, the paper 

= 0. 

Scaled questions: For questions that are ona scale, usually an interval scale, the 

question/statement will have a single column and the coding instruction would 

indicate what number needs to be allocated for the response options given in the 

scale. Consider the following questions. 


NOTES 


Please indicate level of your agreement with the following statements. 


Compared to the Past (5-10 years) SA| A N D | SD 


The individual customer today shops more 


2 |The consumer is well informed about market 
offerings 


SA — Strongly agree; A — Agree; N — Neutral; D — Disagree; SD — Strongly disagree 
The code book for this will look as follows: 


Variable 
Col.no. |Variable Name Coding Instructions Name 


Individual shops A number from 1—5 
more SA =5,A=4,N=3,D=2,SD=1 


Well informed - do - 


Missing values: It is advisable to use a standard format for signifying a non- 
response or a missing value. For example, a code of 9 could be used for a single- 
column variable, 99 for a double-column variable, and 999 for a three character 
variable and so on. The researcher must take care as far as possible to use a value 
that is starkly different from the valid responses. This is one of the reasons why 9 
is suggested. However, in case you havea 10 point scale do not use 9 . 


8.3.2 Coding Open-ended Structured Questions 


The coding of open-ended questions is quite difficult as the respondents’ exact 
answers are noted on the questionnaire. Then the researcher (either individually or 
as a team) looks for patterns and assigns a category code. 

The following example is an open ended question 


If you think lean management was a success so far, please specify three 
most significant reasons that have contributed to its success in your opinion. 
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People gave different answers. Thus, based upon the responses obtained, 
for the above question, the following post—code book was created: 


cura Name Coding Instructions | Variable Name 


Improvement at workplace by X 63a 
eliminating waste. 


To meet increasing demands 
of customers 


To improve quality 


cm To achieve corporate goal 


8.4 CLASSIFICATION AND TABULATION OF DATA 


Sometimes, the data obtained from the primary instrument is so huge that it becomes 
difficult to interpret. In such cases, the researcher might decide to reduce the 
information into homogenous categories. This method of arrangement is called 
classification of data. This can be done on the basis of class intervals. 
Classification by class intervals: Numerical data, like the ratio scale data, can be 
classified into class intervals. This is to assist the quantitative analysis of data. For 
example, the age data obtained from the sample could be reduced to homogenous 
grouped data, for example all those below 25 form one group, those 25-35 are 
another group and so on. Thus, each group will have class limits—an upper and a 
lower limit. The difference between the limits is termed as the class magnitude. One 
can have class intervals of both equal and unequal magnitude. 

The decision on how many classes and whether equal or unequal depends 
upon the judgement of the researcher. Generally, multiples of 2 or 5 are preferred. 
Some researchers adopt the following formula for determining the number of class 
intervals: 

I= R/(1 +3.3 log N) 
where, 


I = size of class interval, 
R = Range (i.e., difference between the values of the largest item and 
smallest item among the given items), 


N = Number of items to be grouped. 


The class intervals that are decided upon could be exclusive, for example: 
10-15 
15-20 
20-25 
25-30 


In this case, the upper limit of each is excluded from the category. Thus we 
read the first interval above as 10 and under 15, the next one as 15 and under 20 
and so on. 


The other kind is inclusive, that is: 
10-15 
16-20 
21-25 
26-30 


Here, both the lower and the upper limits are included in the interval. It says 
10—15 but actually means 10—15.99. It is recommended that when one has 
continuous data it should be signified as 10—15.99, as then all possibilities of the 
responses are exhausted here. However, for discrete data one can use 10-15. 


Once the categories and codes have been decided upon, the researcher 
needs to arrange the same according to some logical pattern. This is referred to as 
tabulation of data. This involves an orderly arrangement of data into an array 
that is suitable for a statistical analysis. Usually, this is an orderly arrangement of 
the rows and columns. In case there is data to be entered for one variable, the 
process is a simple tabulation and, when it is two or more variables, then one 
carries out a cross-tabulation of data. The method of cross-tabulating the data is 
discussed at length in Unit 12. 


Check Your Progress 
3. What does a code book contain? 
4. How is coding done for scaled questions? 


5. What is class magnitude? 


8.5 ANSWERS TO CHECK YOUR PROGRESS 
QUESTIONS 


1. The editing of the raw data takes place at two levels, the first of these is field 
editing and the second is central editing. 


2. One way to handle the problem of ‘armchair interviewing’ or a fudged 
interview is to first scroll the answers to the open-ended questions, as 
generally if the investigator is filling in multiple forms faking these would be 
difficult. 

3. Acode book contains information on the question number, variable name, 
response descriptors, and coding instructions and the column descriptor. 


Data Processing 


NOTES 


Self-Instructional 
Material 141 


Data Processing 4. For questions that are on a scale, usually an interval scale, the question/ 
statement will havea single column and the coding instruction would indicate 
what number needs to be allocated for the response options given in the 


scale. 


NOTES 5. Inthe class interval, each group has a lower and a upper limit. The difference 


between the limits is termed as the class magnitude. 


8.6 SUMMARY 


Data processing refers to the primary data that has been collected specifically 
for the study. 


The researcher has to check for omissions or errors .This is the editing 
stage of the data processing step. This is done first at the field and then at 
the central office level. 


At this stage, the research team conducts some data treatment such as 
allocating the missing values, if possible, backtracking and sometimes, 
plugging the incomplete data. 


Once this is completed, the researcher prepares code book. Classification 
into attributes or class intervals is carried out and the entered data is now 
ready for analysis in a tabular form. 


8.7 KEY WORDS 


e Backtracking: The best and the most efficient way of handling unsatisfactory 
responses is to return to the field, and go back to the respondents. 


e Code book: Coding scheme for all the variables under study 


e Coding The process of identifying and denoting a numeral to the responses 
given by a respondent is called 


e Data tabulation: Arrangement of data according to some logical pattern 


8.8 SELF ASSESSMENT QUESTIONS AND 
EXERCISES 


Short-Answer Questions 


1. What is data editing? Mention its significance. 


2. Distinguish between field editing and centralized in-house editing. Mention 
the standard processes available to the researcher to carry out the editing 
process. 


Self-Instructional 
142 Material 


Long-Answer Questions 
1. How do you code data? What guidelines should be followed to carry out 
the task? Discuss by giving suitable examples. 


2. Distingush between coding closed-ended structured questions and coding 
open-ended structured questions. 


3. Explain the classification and tabulation of data. 
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UNIT 9 UNIVARIATE AND 
BIVARIATE ANALYSIS 
OF DATA 


Structure 
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9.2 Descriptive vs Inferential Analysis 
9.2.1 Descriptive Analysis 
9.2.2 Inferential Analysis 

9.3 Descriptive Analysis of Univariate Data 
93.1 Analysis of Nominal Scale Data with only One Possible Response 
9.3.2 Analysis of Nominal Scale Data with Multiple Category Responses 
9.3.3 Analysis of Ordinal Scaled Questions 
9.3.4 Measures of Central Tendency 
9.3.5 Measures of Dispersion 

9.4 Descriptive Analysis of Bivariate Data 

9.5 Answers to Check Your Progress Questions 

9.6 Summary 

9.7 Key Words 

9.8 Self Assessment Questions and Exercises 

9.9 Further Readings 


9.0 INTRODUCTION 


In the previous unit, we studied the processing of data collected from both primary 
and secondary sources. The next step is to analyse the same so as to draw logical 
inferences from them. The data collected in a survey could be voluminous in nature, 
depending upon the size of the sample. In a typical research study there may be a 
large number of variables that the researcher needs to analyse. The analysis could 
be univariate, bivariate and multivariate in nature. In the univariate analysis, one 
variable is analysed at a time. In bivariate analysis, two variables are analysed 
together and examined for any possible association between them. In multivariate 
analysis, the concern is to analyse more than two variables at a time. 


In this unit, we will concentrate on the descriptive analysis of univariate and 
bivariate data 


9.1 OBJECTIVES 


After going through this unit, you will be able to: 


e Distinguish between univariate, bivariate and multivariate analysis. 


e Differentiate between descriptive and inferential analysis. Univariate and Bivariate 
Analysis of Data 


e Discuss the type of descriptive univariate analysis to be carried on nominal, 
ordinal, interval and ratio scale data. 


e Explain the descriptive analysis of bivariate data. NOTES 


9.2 DESCRIPTIVE VS INFERENTIAL ANALYSIS 


At the data analysis stage, the first step is to describe the sample which is followed 
by inferential analysis. In the descriptive analysis, we describe the sample whereas 
the inferential analysis deals with generalizing the results as obtained from the sample. 


9.2.1 Descriptive Analysis 


Descriptive analysis refers to transformation of raw data into a form that will facilitate 
easy understanding and interpretation. Descriptive analysis deals with summary 
measures relating to the sample data. The common ways of summarizing data are 
by calculating average, range, standard deviation, frequency and percentage 
distribution. Below is a set of typical questions that are required to be answered 
under descriptive statistics: 


e What is the average income of the sample? 

e What is the standard deviation of ages in the sample? 
e What percentage of sample respondents are married? 
e What is the median age of the sample respondents? 


e Which income group has the highest number of user of product in question 
in the sample? 


e Is there any association between the frequency of purchase of product and 
income level of the consumers? 


Types of descriptive analysis 


The type of descriptive analysis to be carried out depends on the measurement of 
variables into four forms—nominal, ordinal, interval and ratio. 


Table 9.1 presents the type of descriptive analysis which is applicable under 
each form of measurement. 


Table 9.1 Descriptive Analysis for Various Levels of Measurement 


Type of Measurement Type of Descriptive Analysis 
Nominal Frequency table, Proportion percentages, Mode 
Ordinal Median, Quartiles, Percentiles, Rank order correlation 
Interval Arithmetic mean, Correlation coefficient 
Ratio Index numbers, Geometric mean, Harmonic mean 
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9.2.2 Inferential Analysis 


After descriptive analysis has been carried out, the tools of inferential statistics are 
applied. Under inferential statistics, inferences are drawn on population parameters 
based on sample results. The researcher tries to generalize the results to the 
population based on sample results. The analysis is based on probability theory 
and a necessary condition for carrying out inferential analysis is that the sample 
should be drawn at random. The following is an illustrative list of questions that are 
covered under inferential statistics. 


e Is the average age of the population significantly different from 35? 


e Is the job satisfaction of unskilled workers significantly related with their 
pay packet? 


Do the users and non-users of a brand vary significantly with respect to 
age? 


Does the advertisement expenditure influences sale significantly? 


Are consumption expenditure and disposable income of households 
significantly correlated? 


e Is the proportion of satisfied workers significantly more for skilled workers 
than for unskilled works? 


Check Your Progress 


1. What are some of the common ways of summarizing data? 


2. What is inferential analysis based on? State its necessary condition. 


9.3 DESCRIPTIVE ANALYSIS OF UNIVARIATE DATA 


The first step under univariate analysis is the preparation of frequency distributions 
of each variable. The frequency distribution is the counting of responses or 
observations for each of the categories or codes assigned to a variable. 


9.3.1 Analysis of Nominal Scale Data with only One Possible Response 


Consider a nominal scale variable—gender of respondents in a survey research. 


Table 9.2 shows both the raw frequency and the percentages of responses 
for each category in case of the variable gender in a sample of 414 respondents. 


Table 9.2 Gender of the respondent Univariate and Bivariate 
Analysis of Data 


Frequency Pee Kent Valid Cumulative 
Per cent Per cent 
Male 301 TZT 72.7 72.7 
Valid | Female 113 2153 27.3 100.0 NOTES 
Total 414 100.0 100.0 


This tabulation process can be done by hand, using tally marks. The results 
indicate that out of a sample of 414 respondents, 301 are male and 113 are female. 
The raw frequencies are often converted into percentages as they are more 
meaningful. In the present case, for example, there are 72.7 per cent male and 
27.3 per cent female respondents. 


9.3.2 Analysis of Nominal Scale Data with Multiple Category Responses 


In section 9.3.1 the variable considered could take only two values, namely, male 
and female and one of the two responses was possible. However, at times, the 
researcher comes across multiple-category questions, where respondents could 
choose more than one answer. In sucha case, the preparation of frequency table 
and its interpretation is slightly different. If the question in the research study is 
multiple category question and the responds are allowed to tick more than one 
choice, the percentage in such a case may not add up to 100. For example, one 
may consider the following question: 


When accessing the internet at a cyber cafe, tick up to four frequently used 
applications for which you use the cyber cafe. 


1. E-mail 

. Chat 

. Browsing 

. Downloading 

. Shopping 

. Net telephony 

. Business and Commerce (e-commerce) 
. Entertainment 

. Adult sites 

. Astrology and Horoscope 


O OND Nn FW WN 


— = 
—- © 


. Education 


— 
N 


. Any other, please specify. 
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The coding for the variable applications has been in binary form where 
values one and zero are assigned. If the respondent uses a particular application, 
the value assigned is 1, otherwise 0. The resulting frequency table for the above- 
mentioned question is as presented in Table 9.3. 


Table 9.3 Frequently used Applications at Cyber Cafe 


Downloading 


Shopping 


Net telephony 


E-commerce 


Entertainment 


Adult sites 


Astrology and horoscopes 


Any Other 
TOTAL RESPONDENTS 


*Total exceeds 100% because of multiplicity of answers. 


In Table 9.3 the percentages are computed on the total sample size of 414. 
If these percentages are added up, they would exceed more than 100 per cent. 
This is because of multiplicity of answers as respondents were given the chance to 
choose more than one answer. The interpretation of the table would be based on 
a sample of 414 and is given as: 


e The most used application at a cyber cafe is e-mail. It is seen that 94.9 per 
cent of the users make use of this. 


e The second popular application is chatting, and 76.3 per cent of the sample 
respondents make use of it. 


e Similarly, other applications in order of preference are browsing (56 per 
cent), downloading (47.6 per cent), education 35.4 per cent), entertainment 
(32.6 per cent) and so on. 


9.3.3 Analysis of Ordinal Scaled Questions 


There could always be some ordinal-scaled questions in the questionnaire. The 
question before the researcher is how to tabulate and interpret the responses to 
such questions. It could be done in two ways as would be shown in the following 
example. The questions asked of the respondents in such a case could be: 


e Rank the following five attributes while choosing a restaurant for dinner. 
Assign a rank of 1 to the most important, 2 to the next important ... and 5 
to the least important. 


— Ambience 

— Food quality 
— Menu variety 
— Service 

— Location 


From a sample of 32, the responses obtained are given in Table 9.4. To 
construct univariate tables out of the given data, one can take up one column at a 
time from Table 9.4 and prepare the separate frequency tables. For example, 
distribution of rank assigned to attribute food quality may be considered in 
Table 9.5. 


Table 9.4 Ranking of Various Attributes while Selecting a Restaurant for Dinner 


Respondent No. | Ambience Menu 
Varie 


4 


Food 
Quality 


© 


N 


D 
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Table 9.5 Distribution of Ranks Assigned to Food Quality 


Frequency Per cent 


50.0 
40.6 


Total 32 100.0 


It is seen from Table 9.5 that out of 32 respondents, 16 (50 per cent) have 
assigned rank one, 13 (40.6 per cent) ranked two, 2 (6.3 per cent) ranked three 
and 1 (3.1 per cent) ranked four to food quality. This shows that food quality is 
given a lot of importance by the respondents. Similar analysis could be carried out 
for other attributes. 


The other way of preparing a univariate table could be to find distribution of 
attribute which got various ranks. Table 9.6 indicates the distribution of attributes 
that received rank one. 


Table 9.6 Distribution of Attributes that Received Rank One 


Attribute Number Percentage 
Ambience 4 12.50 
Food Quality 16 50.00 


Menu Variety 21.88 
Service 9.38 
Location 6.25 
Total 100 


Table 9.6 indicates that 50 per cent of the respondents gave food quality 
rank one, whereas 21.88 per cent gave menu variety as rank one, followed by 
ambience that was ranked one by 12.5 per cent of the respondents. Similar analysis 
could be carried out corresponding to the remaining attributes. 


The ordinal scale data could also be used for preparing a summarized rank 
order. For example, data presented in Table 9.4 gives the ranking by 32 respondents 
on five attributes while choosing a restaurant for dinner. The data given in Table 
9.4 can be used to prepare the summarized rank ordering of various attributes. 
The rankings of attributes given in Table 9.4 can be presented in the form of 
frequency distribution in Table 9.7. 


Table 9.7 Frequency Table of the Rankings of the Attributes while Selecting Univariate and Bivariate 
a Restaurant for Dinner Analysis of Data 


Attribute 


NOTES 


Ambience 


e oe e a o 


Ca a 


Service — 1 
a = 1 aS 1 


Total = 


To calculate a summary rank ee the attribute with the first rank was 
given the lowest number (1) and the least preferred attribute was given the highest 
number (5). 


The summarized rank order is obtained with the following computations as: 


Ambience 
Food Quality 


Menu Variety (7x 1)+(2x 2)+(@ x 3)+ (9 x 4) + 12 x 5) 
Service (3 x 1)+(8x 2)+(11 x 3)+(6x 4)+(4x 5) 
Location (4x 3)+(11 x 4)+(11 x5) 


The total lowest score indicates the first preference ranking. The results 
show the following rank ordering: 


(1) Food quality 
(2) Service 
(3) Ambience 
(4) Menu variety 
(5) Location 
9.3.4 Measures of Central Tendency 
There are three measures of central tendency that are used in research—mean, 
median and mode. 
1. Mean 


The mean represents the arithmetic average ofa variable is appropriate for interval 
and ratio scale data. The mean is computed as: 
$x 


yada" 
n 


Self-Instructional 
Material 151 


Univariate and Bivariate Where 
Analysis of Data = f : 
X = Mean ofsome variable X 


X, = Value of i" observation on that sample 
NOTES n = Number of observations in the sample 


It is also possible to compute the value of mean when interval or ratio scale 
data are grouped into categories or classes. The formula for mean in such a case 
is given by: 


Where, 
f, = Frequency of i" class 
X, = Midpoint of i class 
k = Number of classes 


Given below are two examples to illustrate the computation of arithmetic 
mean: 


Example 9.1: The percentage of dividend declared by a company over the last 
12 years is 5, 8, 6, 10, 12, 20, 18, 15, 30, 25, 20, 16. Compute the average 
dividend. 


Solution: 
Let X, denote the dividend declared in ith year, 


-185 X -2% - 
È X, =185 X= =15.417 


Therefore, the average dividend declared by the company in the last 12 
years is 15.417 per cent. 


Example 9.2: The sales data of 250 retail outlets in the garment industry gave the 
following distribution. Compute the arithmetic mean of the sales. 


Sales (in = lakh) No. of firms 


60-80 


80-100 
100-120 
120-140 


Total 250 
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Solution: 


Sales (in = lakh) No. of firms (f) Mid-point (X) xXx f 
0-20 6 10 60 


20-40 


40—60 34 50 1700 
60-80 46 70 3220 


e Soo S| e 
| Toa | 20 _ | CTO 


— Xf 
> Xf = 21080 xd ft L GIOE 64 a9 


X f. 250 

Hence, the average sales of 250 retail outlets in the garments industry is 
%84.32 lakh. The main limitation of arithmetic mean as a measure of central tendency 
is that it is unduly affected by extreme values. Further, it cannot be computed with 
open-ended frequency distribution without making assumptions regarding the size 
of the class interval of the open-ended classes. In an extremely asymmetrical 
distribution, it is not a good measure of central tendency. 


2. Median 


The median can be computed for ratio, interval or ordinal scale data. The median 
is that value in the distribution such that 50 per cent of the observations are below 
it and 50 per cent are above it. The median for the ungrouped data is defined as 
the middle value when the data is arranged in ascending or descending order of 
magnitude. In case the number of items in the sample is odd, the value of (n+ 1)/ 
2" item gives the median. However if there are even number of items in the sample, 
say of size 2n, the arithmetic mean of" and (n + 1)" items gives the median. It is 
again emphasized that data needs to be arranged in ascending or descending order 
of the magnitude before computing the median. 


Given below are a few examples to illustrate the computation of median: 


Example 9.3: The marks of 21 students in economics are given 62, 38, 42, 43, 
57, 72, 68, 60, 72, 70, 65, 47, 49, 39, 66, 73, 81, 55, 57, 57, 59. Compute the 
median of the distribution. 


Solution: 


By arranging the data in ascending order of magnitude, we obtain: 38, 39, 42, 43, 
47, 49, 55, 57, 57, 57, 59, 60, 62, 65, 66, 68, 70, 72, 72, 73, 81. 

The median will be the value of the 11 observation arranged as above. 
Therefore, the value of median equals 59. This means 50 per cent of students 
score marks below 59 and 50 per cent score above 59. 


Example 9.4: What would be the median score in the above example if there 
were 22 students in the class and the score of the 22™ student was 79. 
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Solution: 


By arranging the data in ascending order of magnitude, we obtain: 38, 39, 42, 43, 
47, 49, 55, 57, 57, 57, 59, 60, 62, 65, 66, 68, 70, 72, 72, 73, 79, 81. 


The median is given by the average of 11th and 12th observation when 
arranged in ascending order of magnitude. 


The value of 11" observation = 59. 
The value of 12" observation = 60. 
Mean of 11 and 12" observation = (59 + 60)/2 = 59.5. 


Hence 50 per cent of the students score marks below 59.5 per cent and 50 
per cent score above 59.5. 


The median could also be computed for the grouped data. In that case first 
ofall, median class is located and then median is computed using interpolation by 
using the assumption that all items are evenly spread over the entire class interval. 
The median for the grouped data is computed using the following formula 

N -CF 
Median = I+ £#——xh 


= Lower limit of the median class 
= Frequency of the median class 


CF = Cumulating frequency for the class immediately below the class 
containing the median 


h = Size ofthe interval of the median class. 
N = Sum total ofall frequencies 


Given below is an example to illustrate the computation of median in the 
case of grouped data: 


Example 9.5: The distribution of dividend declared by 77 companies is given in 
the following table. Compute the median of the distribution. 


Percentage of Number of 
dividend declared Companies 


Solution: 


Percentage of Number of 
dividend declared Companies (f) 


0-10 


Lower limit of the median class = 30 


~ 
ll 


jf = Frequency of the median class = 18 


CF = Cumulating frequency for the class immediately below 
the class containing the median = 37 


h = Size of the interval of the median class = 10 
N = Sum total ofall frequencies 


Substituting these values in the formula for median, we get 
Median = 30.83 


The results show that half of the companies have declared less than 30.83 
per cent dividend and the other half have declared more than 30.83 per cent 
dividend. 


The limitations of median as a measure of central tendency is that it does not 
use each and every observation in its computation since it is a positional average. 


3. Mode 


The mode is that measure of central tendency which is appropriate for nominal or 
higher order scales. It is the point of maximum frequency in a distribution around 
which other items of the set cluster densely. Mode should not be computed for 
ordinal or interval data unless these data have been grouped first. The concept is 
widely used in business, e.g. a shoe store owner would be naturally interested in 
knowing the size of the shoe that the majority of the customers ask for. Similarly, a 
garment manufacturer is interested in determining the size of the shirt that fits most 
people so as to plan its production accordingly. 


Example 9.6: The marks of 20 students of a class in statistics are given as under: 
44, 52, 40, 61, 58, 52, 63, 75, 87, 52, 63, 38, 44, 61, 68, 75, 72, 52, 51, 50, 
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Univariate and Bivariate Solution: 
Analysis of Data 


It is observed that the maximum number of students (four) have obtained 52 marks. 
Therefore, the mode of the distribution is 52. 


NOTES In the case of grouped data, the following formula may be used: 
faf 


xh 
2f —f, -f, 


Mode = /+ 


Where, 
I = Lower limit of the modal class 
Jo fa = The frequencies of the classes preceding and following 
the modal class respectively. 
f = Frequency of modal class 
h = Size of the class interval 


Given below is an example to illustrate the computation of mode ina grouped 
data: 


Example 9.7: The data in the following frequency distribution is about monthly 
wages of semi-skilled worker in a town. Compute the modal wage. 


workers 


7000-8000 


| 11000-12000 | i6 | 


= 

= 

: 

= 

= 
16 


Solution: 
The mode is given by the formula 
Mode = / + rag ee 
2f -f -f, 
Where 
l = Lower limit of the modal class = 8000 


Jo Ja = The frequencies of the classes preceding and following 
the modal class respectively = 24, 28 


f = Frequency of modal class = 32 
h = Size of the class interval = 1000 
Mode = 8000+ — <=" x 1000 = 8666.7 
a lah " 64-24-28 ae 


Hence, modal wages are %8666.7. 
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Another important concept is skewness, which measures lack of symmetry SKR wi pee 
NalySiS O, ata 


in the distribution. In case of symmetrical distribution, mean = median = mode. 
For a positively skewed distribution, mean > median > mode. In sucha case, the 
longer tail of the distribution is towards the right, the mode falls under the peak and 
the mean changes its position as it is affected by extreme values. The same is the 
case with negatively skewed distribution where arithmetic mean 
< median < mode. 


The skewness is measured by the difference between arithmetic mean and 
mode. Ifthe value ofarithmetic mean is greater than mode, skewness is positive 
and if the value of the expression is negative, skewness is negative. 


9.3.5 Measures of Dispersion 


The measures of central tendency locate the centre of the distribution. However, 
they do not provide enough information to the researcher to fully understand the 
distribution being examined. There is a need to study the spread of a distribution 
ofa variable and the methods which provide that are called measures of dispersion. 


The study of dispersion could help in taking better decisions. This is because 
small dispersion indicates high uniformity of the items, whereas large variability denotes 
less uniformity. Ifreturns on a particular investment show lot of variability (dispersion), 
it means a risky investment as compared to the one where variability is very small. 
The various measures of dispersion are discussed below: 


(i) Range: This is the simplest measure of dispersion and is defined as 
the distance between the highest (maximum) value and the lowest 
(minimum) value in an ordered set of values. The range could be 
computed for interval scale and ratio scale data. 


Range = X rax- Xinin 
Where, 
“ax Z Maximum value ofthe variable 
X — = Minimum value ofthe variable 


The limitation of range as a measure of dispersion is that it considers 
only the extreme value and ignores all other data points. The value of 
range could vary considerably from sample to sample. Even with this 
limitation, range as a measure of dispersion is widely used in industrial 
quality control for the preparation of control charts. 


Example 9.8: The following are the prices of shares of a company from Monday 
to Friday: Calculate the range of the distribution. 


Price Q) 


Wednesday 
Thursday 


NOTES 
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Univariate and Bivariate Solution: 
Analysis of Data 
L= Largest values = 210 


S= Smallest value = 100 
NOTES Therefore, range = L -—S=210-—100= 110. 
In the case of a frequency distribution, range is calculated by taking 
the difference between the lower limit of the lowest class and upper 
limit of the highest class. The limitation of range is that it is not based 


on each and every observation of the distribution and, therefore, does 
not take into account the form of distribution within the range. 


(ii) Variance and standard deviation: Variance is defined as the mean 
squared deviation ofa variable from its arithmetic mean. The positive 
square root of the variance is called standard deviation. The variance is a 
difficult measure to interpret and, therefore, standard deviation is used 
as a measure of dispersion. The population standard deviation is 
denoted by o and computed using the following formula: 


— (XX -py 
N 


Where, 


Population standard deviation 


Value of observations 


= Population mean of observations 


ZE x a 
| 


= Total number of observations in the population. 


However, in survey research, we generally take a sample from the 
population. If the standard deviation is computed from the sample 
data, the following formula may be used. 


RET 
n-1 
Where, 
s = Sample standard deviation 
x = Sample mean 
X = Value of observation 
n = Total number of observations in the sample 


Incase of grouped data, the following formula for computing sample 
standard deviation may be used: 
Df (X, —X? 

n—1 
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Where, Univariate and Bivariate 
. ; Analysis of Data 
X, = Value of i" observation 
X = Sample mean 
f, = frequency of i class interval NOTES 
n = sample size 


The standard deviation could be computed in case of interval and 
ratio scale data. 


Example 9.9: Sample data of 10 days’ sales from the two-month data collected on 
daily basis is given below. Compute the sample variance and standard deviation. 


E(X- XY = 813.6 


X(X — xX) _ 813.6 
n-1 9 


Variance = s? = =90.4 


Standard deviation = s = y90.4 = 9.508 
Therefore, the standard deviation of sales of 10 days is 9.508 units. 
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Univariate and Bivariate Example 9.10: The data on dividend declared in percentage is presented in the 
Analysis of Data A i ; p 
following frequency distribution table for a sample of 107 companies. Compute 
the variance and standard deviation of the dividend declared. 


NOTES Dividend Number of 
declared (per cent) Companies 


0-10 5 
10-20 10 
20-30 
30—40 
40-50 


Solution: 
Dividend 
declared Number of 
0-10 
10-20 10 15 150 | — 23.5514 | 554.6685 | 5546.685 
20-30 13 25 325 | — 13.5514 | 183.6405 | 2387.326 
30—40 35 875 — 3.5514 | 12.61246 | 315.3114 
40-50 45 1350 6.448598] 41.58442 | 1247.533 
50-60 55 880 | 16.4486 | 270.5564 | 4328.902 
60-70 65 520 | 26.4486 | 699.5283 | 5596.227 
Total 4125 25050.47 
DX = 4125 
i, AA I 6 5514 
Xf 107 


SAX- XP = 25050.47 


2 
TEE ee E E E S 
n—1 106 


s= Standard deviation = /236.3252 = 15.373 
Therefore, the standard deviation of the dividend declared of 107 
companies is 15.373 per cent. 


(ii) Coefficient of variation: This measure is computed for ratio scale 
measurement. The standard deviation measures the variability ofa 
variable around the mean. The unit of measurement of standard 
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deviation is the same as that of arithmetic mean of the variable itself. Univariate and Bivariate 
The measure of dispersion is considerably affected by the unit of SRR 
measurement. In such a case, it is not possible to compare the variability 
of two distributions using standard deviation as a measure of variability. 
To compare the variability of two or more distributions, a measure of 
relative dispersion called the coefficient of variation can be used. This 
measure is independent of units of measurements. The formula of 
coefficient of variation is: 


NOTES 


Where, 
CV = coefficient of variation 


s = standard deviation of sample 


X = mean ofthe sample 


Example 9.11: For the data given in Example 9.10, compute the coefficient of 
variation. 


Solution: 


Where, 
CV = Coefficient of variation 


s = Standard deviation of sample = 15.373 
X = Mean ofthe sample = 38.5514 


15.373 x 100 
38.5514 


Therefore, the coefficient of variation is 39.88 per cent. As already mentioned, 
coefficient of variation is useful for comparing the variability of two distributions. 
This is a more useful measure when two distributions are entirely different and the 
units of measurements are also different. 


Therefore, CV = = 39.88 per cent 


Some more examples 


Individual series 


1. Find arithmetic mean of the following data. 
58 67 60 84 93 98 100 
Arithmetic mean = £X/n 
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Univariate and Bivariate Where, 
Analysis of Data 


ÈX =the sum of the item 


N =the number of items in the series 


NOTES xX = 58 + 67 + 60 + 84 + 93 + 98 + 100 = 560 
N =7 
xX = 560/7 = 80 


2. Find arithmetic mean of the following distribution. 
2.0 1.8 2.0 2.0 19 20 18 23 25 23 
1.9 22 20 23 
Arithmetic mean= £X/n 
Where, 
ÈX = the sum ofthe item 
n = thenumber ofitems in the series 


EX = 2.04+1.84+2.04+204+194+2.04+ 1.8+2.3+2.54+2.34+ 1.9 
+2.2+2.0+2.3 =29 


N = 14 
xX = 29/14=2.07 


Discrete series 
3. Calculate arithmetic mean of the following 50 workers according to their 
daily wages. 
Daily Wages : 15 18 20 25 30 35 40 42 
Numberofworkers: 2 3 5 10 12 10 5 2 


Arithmetic mean using direct formula 


Wages (X) Frequency(F) Fx 
15 2 30 
18 3 54 
20 5 100 
25 10 250 
30 12 360 
35 10 350 
40 5 200 
42 2 84 
45 l 45 
df= 50 dfx = 473 
Arithmeticmean = = DLfx/df 
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Where Univariate and Bivariate 
: Analysis of Data 
dfx = 473 
Èf = 50 
Arithmeticmean = 1473/50 = 29.46 


NOTES 
Continuous series 


4. Find arithmetic series for the following distribution. 


Marks : 10-20 20-30 30—40 40-50 50—60 60-70 70-80 
80-90 
No. ofstudents : 6 12 18 20 20 14 8 2 
Marks Frequency(f) Mid value(X) fx 
10-20 
20-30 
30-40 
40-50 
50-60 
60-70 
70-80 8 75 600 
80-90 2 85 170 
=f= 100 Xfx = 4700 
Arithmeticmean = 2fx/2f 
Where, 
Xf = 4700 
>f = 100 
Arithmeticmean = 4700/100 
= 47 


5. Calculate the range of the following distribution, showing income of 10 
workers. Also calculate the co-efficient of range. 


25 37 40 23 58 75 89 20 81 95 
Range = H-L 
H = Highest value = 95 
L = Lowest value = 20 
Range = 95 — 20 = 75 
Coefficient ofrange = (H-L)/(H+L) 
= (95-20)/(95+20) 
= 75/115 
= 6521 
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6. Find the quartile deviation and its co-efficient. 
20 58 40 12 30 15 50 
First of all arrange the data in ascending order. 
12 15 20 28 30 40 50 
Q1 =Size of (N+1)/4th item 
= Size of (7+1)/4th item 
= Size of (8/4)th item 
= 2nd item 
=15 
Q3 = Size of 3 x (N+1)4th item 
= Size of 3 x (7+1)/4th item 
= Size of 3 x 8/4th item 
= (3x2)nd item 
= 6th item 
= 40 
Co-efficient Deviation = (Q3 — Q1)/(Q3 + 1) 
= (40 — 15)/(40 + 15) 
= 25/55 
A545 


II 


9.4 DESCRIPTIVE ANALYSIS OF BIVARIATE DATA 


As already mentioned, bivariate analysis examines the relationship between two 
variables. There are various methods used for carrying out bivariate analysis. We 
will discuss two methods, namely, cross-tabulation and correlation coefficient. 


(i) Cross-tabulation 


In simple tabulation, the frequency and the percentage for each question was 
calculated. In cross-tabulation, responses to two questions are combined and 
data is tabulated together. A cross-tabulation counts the number of observations 
in each cross-category of two variables. The descriptive result of a cross- 
tabulation is a frequency count for each cell in the analysis. For example, in 
cross-tabulating a two-category measure of income (low- and high-income 
households) with a two-category measure of purchase intention of a product 
(low and high purchase intentions) the basic result is a cross-classification as 
shown in Table 9.8. 


Table 9.8 Cross-table of Purchase Intention and Income Univariate and Bivariate 
Analysis of Data 


Income 


Low Income High Income 


Purchase Low purchase intention 120 60 
Intention High purchase intention 80 190 
200 250 


The results of cross-tabulation show the number of sample respondents 
with low income having low purchase intention, low income with high purchase 
intention, high income with low purchase intention and high income with high 
purchase intention. 


As is the case with simple tabulations, the results of a cross-tabulation are 
more meaningful if cell frequencies are computed as percentages. The percentages 
can be computed in three-ways. As is the case of Table 9.8, the percentages can 
be computed (i) row-wise so that the percentages in each row add up to 100 per 
cent; (2) column-wise so that the percentages in each column add up to 100 per 
cent or (3) cell percentages, such that percentages added across all cells equal 
100 per cent. The interpretation of percentages is different in each of the three 
cases. Therefore, the question arises which of these percentages is most useful to 
the researcher. What is the general rule for computing percentages? 


The basis for calculating category percentage depends upon the nature of 
relationship between the variables. One of the variables could be viewed as 
dependent variable and the other one as independent variable. In the cross- 
tabulation presented in Table 9.8, the purchase intention could be treated as 
dependent variable, which depends upon income (independent variable). The rule 
is to cast percentages in the direction of independent (causal) variable across the 
dependent variable. For Table 9.8, there are 200 respondents with low income, 
out of which 120 have low purchase intention for the product. In terms of 
percentages, 60 per cent of the respondents with low income have low purchase 
intention for the product. Now there are 250 people with high income, out of 
which 60 have low purchase intention and 190 have high purchase intention for 
the product. By calculating percentages column wise, it is seen that 24 per cent 
have low purchase intention whereas 76 per cent have high purchase intention for 
the product. The results indicate that with increase in income, the purchase intention 
for the product increases. 


Table 9.9 presents the percentages column-wise as given below: 


Table 9.9 Cross-table of Purchase Intention and Income 
(Column-wise Percentages) 


High Income 


Purchase Low purchase intention 60% 24% 
Intention High purchase intention 40% 76% 


NOTES 
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From the above example, it is clear that any two variables each having 
certain categories can be cross-tabulated. The interpretation of the cross-tabulation 
results may show a high association between two variables. That does not mean 
one of them, the independent variable, is the cause of the other variable—the 
dependent variable. Causality between the two variables is more of an assumptions 
made by the researcher based on his experience or expectations. Just because 
there is a high association between two variables, it does not imply a cause-and- 
effect relationship. 


(ii) Correlation coefficient 


Simple correlation measures the degree of association between two variables. 
The correlation could be positive, negative or zero. 


Quantitative estimate of a linear correlation 


A quantitative estimate of a linear correlation between two variables X and Y is 
given by Karl Pearson as: 


s(x, - X)(Y, -Y) 


r= i=1 


l SAT [Sev yy 


i= 


which may be rewritten as: 


xy m n iae" n — 
(Ex! -nxe |B 2 =ny? 


i= 


Where, r 


Correlation coefficient between X and Y 


xy 
X = Mean ofthe variable X 
Y 


Mean ofthe variable Y 
n = Sizeofthe sample 


It may be noted that the above-mentioned formulae are for the linear 
correlation coefficient. The linear correlation coefficient takes a value between —1 
and +1 (both values inclusive). If the value of the correlation coefficient is equal to 
1, the two variables are perfectly positively correlated. Similarly, if the correlation 
coefficient between the two variables X and Y is —1, such a correlation is called 
perfect negative correlation. Let us consider an example to show the computation 
and interpretation of correlation coefficient. 


Example 9.12: Consider the data on the quantity demanded and the price of a 
commodity over a ten-year period as given in the following table: 


Year Demand Price 
1996 100 
1997 75 
1998 80 
1999 
2000 
2001 
2002 
2003 
2004 
2005 


Estimate the correlation coefficient between the quantity demanded and 
price and interpret the same. 


Solution: 


This problem will be attempted first by showing all the detailed computations using 
the following formula. 


= XY, -nX Y 


3x? -nx dzy; ny 


The required computations are shown in the following table: 


r = 


e QO] Price | XY | x! y 

5 500 10000 

6 6400 

6 4900 

8 2500 

7 4225 

5 8100 

10000 

3 12100 

67450 
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piel ey Bivariate Substituting these values in the formula for the correlation coefficient, we 
nalysis of Data gt: 
pe 4500 -10 x6 x80 
NOTES ™ [390 -10 x6 x6 67450 -64000 

= 4500 — 4800 
390 — 360 J67450 — 64000 

© -300 -300 

~ J3043450 5.477 x58.737 


_ —300 
-321.701 
The value ofthe correlation coefficient between the quantity demanded and 
price is —0.9325, which is negative and very high. This shows that the quantity 
demanded and price move in the opposite directions. 


= -0.9325 


Exercises 


1. You are presented with the following table of frequency counts to show the 
nature ofrelationship between age and watching of movies in a cinema hall. 
What conclusion can be drawn? 


Frequency of 
watching movies 


4 or more times in a month 


Less than 4 times in a month 


2. The following bivariate table was prepared to understand the relationship 
between preference for continental food and monthly income of the 
respondents. What conclusion can be drawn? 


< 730,000 | 7 30,000 —% 60,000 | More than % 60,000 


Preference for 


continental food 


3. The table below presents the ranks which were assigned by three judges to 
the works often artists: 
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Compute the correlation coefficient for each pair of ranking and decide: 
(a) Which two judges are most alike in their opinions about these artists? 


(b) Which two judges are different in their opinions about their artists? 


Check Your Progress 
. Which type of data can be computed through median? 
. State the positively skewed distribution. 


. Mention the limitation of range as a measure of dispersion. 


Nn BR Q 


. What is the descriptive result of a cross-tabulation? 


9.5 ANSWERS TO CHECK YOUR PROGRESS 
QUESTIONS 


1. Some of the common ways of summarizing data are by calculating average, 


range, standard deviation, frequency, and percentage distribution. 


2. Inferential analysis is based on probability theory and a necessary condition 
for carrying out inferential analysis is that the sample should be drawn at 


random. 
3. The median can be computed for ratio, interval or ordinal scale data. 


4. The positively skewed distribution is mean > median > mode. 


5. The limitation of range as a measure of dispersion is that it considers only 


the extreme value and ignores all other data points. 


6. The descriptive result of a cross-tabulation is a frequency count for each 


cell in the analysis. 


9.6 SUMMARY 


e Data analysis could be univariate, bivariate and multivariate. Further, it could 


be descriptive or inferential. 


e The type of analysis depends upon the level of measurement i.e. nominal, 


ordinal, interval and ratio. 


e The bivariate analysis of data is illustrated through cross-table and correlation 


coefficient. 


Univariate and Bivariate 
Analysis of Data 


NOTES 
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Analysis of Data 9.7 KEY WORDS 


e Bivariate analysis: The data analysis that deals with analysis of two 
NOTES variables at a time. 


e Inferential analysis: The data analysis that attempts to generalize the results 
of sample. 


e Median: The value in the distribution such that 50 per cent of observations 
in the distribution are below it and 50 per cent are above it. 


e Mode: The point of maximum frequency in a distribution. 


e Univariate analysis: The data analysis that deals with the analysis of one 
variable at a time. 


9.8 SELF ASSESSMENT QUESTIONS AND 
EXERCISES 


Short-Answer Questions 


1. Differentiate between descriptive and inferential analysis of data. 
2. Briefly explain analysis of nominal scale data with multiple category 
responses. 


Long-Answer Questions 


1. Explain with the help of examples various measures of central tendency. 
2. Discuss various measures of dispersions. List out their merits and demerits. 


3. How does one go about preparing cross-table between two variables each 
having two categories? In what ways should percentages be calculated to 
interpret the results ofa cross-tabulation? 


4. You are presented with the following table of frequency counts to show the 
nature of relationship between age and watching of movies in a cinema hall. 
What conclusion can be drawn? 


Frequency of A 
watching movies Under 35 35 & above 


4 or more times in a month 


ge 
E 
Less than 4 times in a month 


5. Compute the correlation coefficient between sales and advertising 
expenditure ofa company from given data. Also interpret the results. 


E, 5 
Sales (` crore) 236 300 453 500 720 


Advt. Exp. (` crore) 10 11 13 14 17 
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UNIT 10 TESTING OF HYPOTHESES 
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10.8 Summary 
10.9 Key Words 
10.10 Self Assessment Questions and Exercises 
10.11 Further Readings 


10.0 INTRODUCTION 


A hypothesis is an assumption or a statement that may or may not be true. The 
hypothesis is tested on the basis of information obtained from a sample. Instead of 
asking, for example, what the mean assessed value of an apartment in a multistoried 
building is, one may be interested in knowing whether or not the assessed value 
equals some particular value, say {80 lakh. Some other examples could be whether 
anew drug is more effective than the existing drug based on the sample data, and 
whether the proportion of smokers in a class is different from 0.30. The formulation 
of hypothesis has already been discussed in Unit 2. We will now study the concepts 
and steps in the testing of hypothesis exercise. 


10.1 OBJECTIVES 


After going through this unit, you will be able to: 
e Discuss the concepts used in the testing of hypothesis exercise 
e Explain the steps used in testing of hypothesis exercise 


e Explain the test of the significance of the mean ofa single population using 
both ¢ and Z test 


e Explain the test of the significance of difference between two population 
means using ¢ and Z tests 
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e Discuss the test of the significance of a single population proportion Testing of Hypotheses 


e Explain the test of the significance of the difference between two population 
proportions using a Z test 


OO OreOeoOoroonm'" Oo" NOTES 
10.2 CONCEPTS IN TESTING OF HYPOTHESIS 


Below are discussed some concepts on testing of hypotheses to be used in this 
unit. 

e Null hypothesis: The hypotheses that are proposed with the intent of 
receiving a rejection for them are called null hypotheses. This requires that 
we hypothesize the opposite of what is desired to be proved. For example, 
if we want to show that sales and advertisement expenditure are related, 
we formulate the null hypothesis that they are not related. If we want to 
prove that the average wages of skilled workers in town 1 is greater than 
that of town 2, we formulate the null hypotheses that there is no difference 
in the average wages of the skilled workers in both the towns. A null 
hypothesis is denoted by H,. 


e Alternative hypotheses: Rejection of null hypotheses leads to the 
acceptance of alternative hypotheses. The rejection of null hypothesis 
indicates that the relationship between variables (e.g., sales and advertisement 
expenditure) or the difference between means (e.g., wages of skilled workers 
in town | and town 2) or the difference between proportions have statistical 
significance and the acceptance of the null hypotheses indicates that these 
differences are due to chance. The alternative hypotheses are denoted by 
H. 

e One-tailed and two-tailed tests: A test is called one-sided (or one-tailed) 
only if the null hypothesis gets rejected when a value of the test statistic falls 
in one specified tail of the distribution. Further, the test is called two-sided 
(or two-tailed) if null hypothesis gets rejected when a value of the test statistic 
falls in either one or the other of the two tails of its sampling distribution. For 
example, consider a soft drink bottling plant which dispenses soft drinks in 
bottles of 300 ml capacity. The bottling is done through an automatic plant. 
An overfilling of bottle (liquid content more than 300 ml) means a huge loss 
to the company given the large volume of sales. An underfilling means the 
customers are getting less than 300 ml of the drink when they are paying for 
300 ml. This could bring bad reputation to the company. The company 
wants to avoid both overfilling and underfilling. Therefore, it would prefer 
to test the hypothesis whether the mean content of the bottles is different 
from 300 ml. This hypothesis could be written as: 


H, : w=300nml. 


0 


H, : p#300ml. 


1 
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The hypotheses stated above are called two-tailed or two-sided hypotheses. 
However, if the concern is the overfilling of bottles, it could be stated as: 
H, : p=300ml. 


0 


H, : p>300mli. 


1 
Such hypotheses are called one-tailed or one-sided hypotheses and the researcher 
would be interested in the upper tail (right hand tail) of the distribution. Ifhowever, 
the concern is loss of reputation of the company (underfilling of the bottles), the 
hypothesis may be stated as: 


H, : w=300ml. 


0 


H, : pw<300ml. 


1 
The hypothesis stated above is also called one-tailed test and the researcher 
would be interested in the lower tail (left hand tail) of the distribution. 


Type I and type II error: The acceptance or rejection of a hypothesis is 
based upon sample results and there is always a possibility of sample not being 
representative of the population. This could result in errors, as a consequence of 
which inferences drawn could be wrong. The situation could be depicted as given 
in Figure 10.1. 


Accept H, Reject H, 


Correct 


Secon Type | Error 


Ho 
False 


Correct 
decision 


Type II Error 


Fig. 10.1 Type I and Type II Errors 


Ifnull hypothesis H, is true and is accepted or H, when false is rejected, the 
decision is correct in either case. However, if the hypothesis H, is rejected when it 
is actually true, the researcher is committing what is called a Type I error. The 
probability of committing a Type I error is denoted by alpha (a). This is termed as 
the level of significance. Similarly, if the null hypothesis H, when false is accepted, 
the researcher is committing an error called Type II error. The probability of 
committing a Type H error is denoted by beta (B). The expression 1 —f is called 
power of test. To decrease the risk of committing both types of errors, you may 
increase the sample size. 


10.2.1 Steps in Testing of Hypothesis Exercise 


The following steps are followed in the testing of a hypothesis: 


Setting up of a hypothesis: The first step is to establish the hypothesis to 
be tested. As it is known, these statistical hypotheses are generally assumptions 
about the value of the population parameter; the hypothesis specifies a single value 
or arange of values for two different hypotheses rather than constructing a single 


hypothesis. These two hypotheses are generally referred to as (1) the null 
hypotheses denoted by H, and (2) alternative hypothesis denoted by H.. 


The null hypothesis is the hypothesis of the population parameter taking a 
specified value. In case of two populations, the null hypothesis is of no difference 
or the difference taking a specified value. The hypothesis that is different from the 
null hypothesis is the alternative hypothesis. If the null hypothesis H, is rejected 
based upon the sample information, the alternative hypothesis H, is accepted. 
Therefore, the two hypotheses are constructed in such a way that if one is true, the 
other one is false and vice versa. 


Setting up of a suitable significance level: The next step is to choose a 
suitable level of significance. The level of significance denoted by a is chosen 
before drawing any sample. The level of significance denotes the probability of 
rejecting the null hypothesis when it is true. The value of a varies from problem to 
problem, but usually it is taken as either 5 per cent or 1 per cent. A 5 per cent level 
of significance means that there are 5 chances out of hundred that a null hypothesis 
will get rejected when it should be accepted. When the null hypothesis is rejected 
at any level of significance, the test result is said to be significant. Further, if a 
hypothesis is rejected at 1 per cent level, it must also be rejected at 5 per cent 
significance level. 


Determination of a test statistic: The next step is to determine a suitable 
test statistic and its distribution. As would be seen later, the test statistic could be ¢, 
Z, x or F, depending upon various assumptions to be discussed later in the book. 


Determination of critical region: Before a sample is drawn from the 
population, itis very important to specify the values of test statistic that will lead to 
rejection or acceptance of the null hypothesis. The one that leads to the rejection 
of null hypothesis is called the critical region. Given a level of significance, a, the 
optimal critical region for a two-tailed test consists of that 
a/2 per cent area in the right hand tail of the distribution plus that o/2 per cent in 
the left hand tail of the distribution where that null hypothesis is rejected. 


Computing the value of test-statistic: The next step is to compute the 
value of the test statistic based upon a random sample of size n. Once the value of 
test statistic is computed, one needs to examine whether the sample results fall in 
the critical region or in the acceptance region. 


Making decision: The hypothesis may be rejected or accepted depending 
upon whether the value of the test statistic falls in the rejection or the acceptance 
region. Management decisions are based upon the statistical decision of either 
rejecting or accepting the null hypothesis. 


Incase a hypothesis is rejected, the difference between the sample statistic 
and the hypothesized population parameter is considered to be significant. On the 
other hand, if the hypothesis is accepted, the difference between the sample statistic 
and the hypothesized population parameter is not regarded as significant and can 
be attributed to chance. 


Testing of Hypotheses 
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10.2.2 Test Statistic for Testing Hypothesis about Population Mean 


In this section, we will take up the test of hypothesis about population mean in a 
case of single population. 


One of the important things that have to be kept in mind is the use of an 
appropriate test statistic. In case the sample size is large (n > 30), Z statistic 
would be used. For a small sample size (n < 30), a further question regarding the 
knowledge of population standard deviation (o) is asked. Ifthe population standard 
deviation o is known, a Z statistic can be used. However, if o is unknown and is 
estimated using sample data, a t test with appropriate degrees of freedom is used 
under the assumption that the sample is drawn from a normal population. It is 
assumed that you have the knowledge of Z and ¢ distribution from the course on 
statistics. However, these would be briefly reviewed at the appropriate place. 
Table 10.1 summarizes the appropriateness of the test statistic for conducting a 
test of hypothesis regarding the population mean. 


Table 10.1 Appropriateness of Test Statistic in Testing Hypotheses about Means 


Sample Size Knowledge of Population Standard Deviation (o) 
Known Not Known 
Large (n > 30) Z Z 


Check Your Progress 


1. What is a two-tailed test? 


2. Mention the symbol through which the probability of committing a Type II 
error is denoted. 


3. What is called as the critical region? 


10.3 TESTS CONCERNING MEANS-THE CASE OF 
SINGLE POPULATION 


In this section, anumber of illustrations will be taken up to explain the test of 
hypothesis concerning mean. Two cases of large sample and small samples will be 
taken up. 


Case of large sample 


As mentioned earlier, in case the sample size n is large or small but the value of the 
population standard deviation is known, a Z test is appropriate. There can be 
alternate cases of two- tailed and one-tailed tests of hypotheses. 


Corresponding to the null hypothesis H, : u = u p the following criteria could 
be used as shown in Table 10.2. 


The test statistic is given by, 


Z x = Huo 
29 
Jn 
Where, 
x = Sample mean 
o = Population standard deviation 


H = The value ofu under the assumption that the null hypothesis is true. 


Size of sample. 


3 
II 


Table 10.2 Criteria for Accepting or Rejecting Null Hypothesis under 
Different Cases of Alternative Hypotheses 


S. Alternative Reject the Null Accept the Null 
No. Hypothesis Hypothesis if Hypothesis if 
1. 
2. 
3. H#Uo Z <- Zan — Zan < Z < Zaz 
or 
Z> Zai2 


Ifthe population standard deviation o is unknown, the sample standard 


heaps 1 = 
deviation s = {d(x = x). 


is used as an estimate of o. It may be noted that Z, and Z „are Z values such that 
the area to the right under the standard normal distribution is œ and o/2 respectively. 
Below are solved examples using the above concepts. 


Example 10.1: Asample of 200 bulbs made by a company give a lifetime mean 
of 1540 hours with a standard deviation of 42 hours. Is it likely that the sample has 
been drawn from a population with a mean lifetime of 1500 hours? You may use 5 
per cent level of significance. 


Solution: 


In the above example, the sample size is large (n = 200), sample mean ( X) equals 
1540 hours and the sample standard deviation (s) is equal to 42 hours. The null 
and alternative hypotheses can be written as: 


H, : p=1500 hrs 


0 


H, : p#1500 hrs 


1 


Testing of Hypotheses 


NOTES 
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Testing of Hypotheses It is a two-tailed test with level of significance (a) to be equal to 0.05. Since 
nis large (n > 30), though population standard deviation o is unknown, one can 
use Z test. The test statistics are given by: 


NOTES z- X Ho 
oO 


x 
Where, „= Value of u under the assumption that the null hypothesis is true 


x = Estimated standard error of mean 
o ô s 42 

Here = 1,500, — = = = =2.97 
Luo x Vn Jn 200 

(Note that ô is estimated value of o.) 

z- X -Um _ 4540-1500 _ 40 _ 43 4 
s 2.97 2.97 
Jn 


The value ofa = 0.05 and since it is a two-tailed test, the critical value Z is 
given by- Z „and Z _„ which could be obtained from the standard normal Table 
7.1 given in Unit 7. 


Rejection Rejection 


Rejection regions for Example 10.1 


Since the computed value of Z= 13.47 lies in the rejection region, the null hypothesis 
is rejected. Therefore, it can be concluded that the average life of the bulb is 
significantly different from 1,500 hours. 


Example 10.2: On a typing test, a random sample of 36 graduates of a secretarial 
school averaged 73.6 words with a standard deviation of 8.10 words per minute. 
Test an employer’s claim that the school’s graduates average less than 75.0 words 
per minute using the 5 per cent level of significance. 


Solution: 
H.W 
H, : w<75 
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X = 73.6, s = 8.10, n = 36 and a = 0.05. As the sample size is large 
(n> 30), though population standard deviation o is unknown, Z test is appropriate. 


The test statistic is given by: 


as X-u _73.6-75 _ -1.4 


EMO S = = —1.04 
ô 1.35 1.35 
X 
Ż- s -21.21 135) 
x Jn 36 6 


Since it is a one-tailed test and the interest is in the left hand tail of the 
distribution, the critical value of Zis given by—Z, =—1.645. Now, the computed 
value of Z lies in the acceptance region, and the null hypothesis is accepted as 
shown below: 


Acceptance 
Region 


-1.04 
Rejection 


Region 
-Z, = —1.645 


Rejection region for Example 10.2 


Case of small sample 


In case the sample size is small (n < 30) and is drawn from a population having a 
normal population with unknown standard deviation o, a t test is used to conduct 
the hypothesis for the test of mean. The ¢ distribution is a symmetrical distribution 
just like the normal one. However, t distribution is higher at the tail and lower at 
the peak. The ¢ distribution is flatter than the normal distribution. With an increase 
in the sample size (and hence degrees of freedom), t distribution loses its flatness 
and approaches the normal distribution whenever n > 30. A comparative shape of 
t and normal distribution is given in Figure 10.2. 


t distribution Z distribution 


Fig. 10.2 Shape of t and Normal Distribution 


Testing of Hypotheses 


NOTES 
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Testing of Hypotheses The procedure for testing the hypothesis of a mean is similar to what is 
explained in the case of large sample. The test statistic used in this case is: 


t X — Hpo 
n-1 ô 
NOTES a 
x 
Where, ie (where s = Sample standard deviation) 
x Jn 


n—1=degrees of freedom 


A few examples pertaining to ‘?’ test are worked out for testing the hypothesis 
of mean in case ofa small sample. 
Example 10.3: Prices of share (in %) of a company on the different days in a 
month were found to be 66, 65, 69, 70, 69, 71, 70, 63, 64 and 68. Examine 
whether the mean price of shares in the month is different from 65. You may use 
10 per cent level of significance. 


Solution: 

H, : = 65 

H, : »#65 

Since the sample size is n = 10, which is small, and the sample standard 
deviation is unknown, the appropriate test in this case would be ¢. First of 


all, we need to estimate the value of sample mean ( X ) and the sample standard 
deviation (s). It is known that the sample mean and the standard deviation are 
given by the following formula. 


es ga LXX] 


n n—1 


The computation of X and s is shown in Table 10.3. 


ees (x-x) 2? =7.83 


Ss = 47.83 =2.80 
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Table 10.3 Computation of Sample Mean and Standard Deviation 


The test statistic is given by: 


X -upo _ X -Umo _ 675-65 _ 25x10 
s 2.8 2.8 


ar = ô S 
x Jn 10 
= 2.5 x 3.16/2.8 = 7.91/2.8 = 2.82 
The critical values of t with 9 degrees of freedom for a two-tailed test 
are given by—1.833 and 1.833. Since the computed value of flies in the rejection 
region (see figure below), the null hypotheses is rejected. 


Rejection Rejection 
Region Region 
—1.833 1.833 2.82 


Rejection regions for Example 10.3 
Therefore, the average price of the share of the company is different 


from 65. 


NOTES 
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Testing of Hypotheses Example 10.4: Past records indicate that a golfer has averaged 82 on a certain 
course. With a new set of clubs, he averages 7 over five rounds with a standard 
deviation of 2.65. Can we conclude that at 0.025 level of significance, the new 


club has an adverse effect on the performance? 
NOTES 


Solution: 
H, : w=82 
H, : <82 


X =7.9,n=5,s=2.65, a =0.025. As the population standard deviation 
is unknown and the sample size is small (n < 30), a t test would be appropriate. 
The test statistic is given by: 


| X-u, X-u 79-82 0.3 
n-1 z = = = —— =-0.25 
ô s/iJn 1185 1.185 


x 


(= aes =1.186) 


in 5 


The critical value of t at 0.025 level of significance with four degrees of 
freedom is given by —t = —2.776 (see Table 10.4). As the sample ¢ value of 
—0.25 lies in the acceptance region, the null hypothesis is accepted (see figure 
below). 


Acceptance Region 
Rejection Region 
2.776 —0.25 


Rejection region for Example 10.4 


Therefore, there is no adverse effect on the performance due to a change in the 
club and the performance can be attributed to chance. 
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Table 10.4 Some critical values of t Testing of Hypotheses 


Level of Significance 


Degrees of Freedom 1% 5% 10% 


1 63.657 12.706 6.314 NOTES 
2 9.925 4.303 2.920 

3 5.841 3.182 2.353 

4 4.604 2.776 2.132 

5 

6 

7 

8 


9 
10 
11 
12 
13 


18 2.878 2.101 1.734 


23 
24 
25 
26 


Note: These table values of ‘£?’ are in respect of two-tailed tests. If we use the t-distribution for one-tailed 
test then we are interested in determining the area located in one tail. So to find the appropriate ¢-value for 
a one-tailed test say at a 5% level with 12 degrees of freedom, then we should look in the above table under 
the 10% column opposite the 12 degrees of freedom row. (This value will be 1.782). This is true because the 
10% column represents 10% of the area under the curve contained in both tails combined, and so it also 
represents 5% of the area under the curve contained in each of the tails separately. 


10.4 TESTS FOR DIFFERENCE BETWEEN TWO 
POPULATION MEANS 


So far, we have been concerned with the testing of means of a single population. 
We took up the cases of both large and small samples. It would be interesting to 
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NOTES 


Self-Instructional 
Material 


examine the difference between the two population means. A gain, various cases 
would be examined as discussed below: 


Case of large sample 


In case both the sample sizes are greater than 30, a Z test is used. The hypothesis 
to be tested may be written as: 


H, : Hy =H, 

H, : yw, 4H, 

Where, 

u = mean of population 1 
m, = mean of population 2 


The above is a case of two-tailed test. The test statistic used is: 


Z= (X: -X2)- (u -H2)Ho 


2 


o C 
n, 


+ 
n 


2 


X, = Mean ofsample drawn from population 1 
X, = Mean ofsample drawn from population 2 


n, = sizeofsample drawn from population 1 


n, = sizeofsample drawn from population 2 


If fand § are unknown, their estimates given by Ê and § are used. 


a ie = 
ô=s,= |-— 5 (X, -X:} 
1 1 a 1 1 
i TZ = 
O=S,= — >} (Xx -X2) 
2 N2121 


The Z value for the problem can be computed using the above formula and 
compared with the table value to either accept or reject the hypothesis. Let us 
consider the following problem: 


Example 10.5: A study is carried out to examine whether the mean hourly wages 
of the unskilled workers in the two cities—Ambala Cantt and Lucknow are the 
same. The random sample of hourly earnings in both the cities is taken and the 
results are presented in the Table 10.5. 


Table 10.5 Survey Data on Hourly Earnings in Two Cities Testing of Hypotheses 


City Sample Mean Standard Sample Size 
Hourly Earnings Deviation of 
Sample 


NOTES 


Ambala Cantt % 8.95 (X1) 0.40 (s4) 200 (nı) 


Lucknow = 9.10 (X2) 0.60 (s2) 175 (n2) 


Using a 5 per cent level of significance, test the hypothesis of no difference 
in the average wages of unskilled workers in the two cities. 


Solution: We use subscripts 1 and 2 for Ambala Cantt and Lucknow 
respectively. 

Hy? W=B, > H,-H,=9 
H, : #4, > p,-4,#0 


1 


The following survey data is given: 
Xı = 8.95, X2 = 9.10,s, = 0.40,s, =0.60,n, = 200,n, = 175,0 = 0.05 


Since both n,n, are greater than 30 and the sample standard deviations 
are given, a Z test would be appropriate. 


The test statistic is given by: 


z- (X1 - X2) - (m = u2)ĦHo 


2 2 
areas 
n ë M 


As ©, 6, are unknown, their estimates would be used. 


ô ô 
S4 =14, S2 =2 


II 


a2, 22 2 2 
6+0 (0.4) (0.6) 
-= 0.0028 = 0.0053 
a {oa HETT 


z= (8.95 —9.10)—-O _ 9 83 
0.053 
As the problem is of a two-tailed test, the critical values of Z at 5 per cent 
level of significance are given by- Z , =—1.96 and Z „= 1.96. The sample value 


of Z=—2.83 lies in the rejection region as shown in the figure below: 
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Rejection 
Region 


NOTES Sample Rejection 


Value Region 


| 


-2.83 -1.96 1.96 


Rejection regions for Example 10.5 


Case of small sample 


If the size of both the samples is less than 30 and the population standard deviation 
is unknown, the procedure described above to discuss the equality of two 
population means is not applicable in the sense that a t test would be applicable 
under the assumptions: 


(a) Two population variances are equal. 


(b) Two population variances are not equal. 
Population variances are equal 


If the two population variances are equal, it implies that their respective unbiased 
estimates are also equal. In such a case, the expression becomes: 


ô ô Ss ê l 1 
+— = +— =ô, |— + 
n Nn, Mm Ng Mm Ng 


(Assuming ô? =63 = ô°) 


To get an estimate of §2, a weighted average of s? and så is used, where 
the weights are the number of degrees of freedom of each sample. The weighted 
average is called a ‘pooled estimate’ of ¢2. This pooled estimate is given by the 


expression: 


g2 _ (m= Nsi +(n -1)83 
N +N -2 


The testing procedure could be explained as under: 
H, : u5, > uh, =O 
H, : #4, > =u, #0 


1 
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In this case, the test statistic t is given by the expression: Testing of Hypotheses 


i X1- X2)- (u - 2) Ho 


mM+n-2 T1 1 
a + ee 
o m n NOTES 


Where, 


(n —1)s? + (u2 —1) 88 
N +2 —2 


ô= 


Once the value oft statistic is computed from the sample data, it is compared 
with the tabulated value at a level of significance a to arrive at a decision regarding 
the acceptance or rejection of hypothesis. Let us work out a problem illustrating 
the concepts defined above. 


Example 10.6: Two drugs meant to provide relief to arthritis sufferers were 
produced in two different laboratories. The first drug was administered to a group 
of 12 patients and produced an average of 8.5 hours of relief with a standard 
deviation of 1.8 hours. The second drug was tested on a sample of 8 patients and 
produced an average of 7.9 hours of relief with a standard deviation of 2.1 hours. 
Test the hypothesis that the first drug provides a significantly higher period of 
relief. Youmayuse 5 per cent level of significance. 


Solution: Let the subscripts 1 and 2 refer to drug 1 and drug 2 respectively. 
H, : Wj=B, > BH, =O 
H,: w,4u, > p,-y, 40 


The following survey data is given: 
X1 =8.5,X2 =7.9,8; =1.8,85 = 2.1, =12,n, =8 


As both n, n, are small and the sample standard deviations are unknown, 
one may use a f test with the degrees of freedom =n, +n,-2=12+8-—2=18 
d.f. 


The test statistics is given by: 


t _(X1-X2)- (u -u2)Ho 


m+no-2- 
1Tr12 1 1 
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Testing of Hypotheses Where, 
(n -1)s? + +(n2 Ayes 
Mm +N -2 
NOTES 
IE 4)(1.8)2 + (8 -1)(2.1}2 a 
12+8-2 18 
- ote 30.87 - [881 81 aoa 
18 18 
,_(8.5-7.9)-(0)_ 0.6 
8 jg2|1,1 19240.2083 
12 8 
0.6 0.6 -0.685 


 1.92x0.456 0.8755 

The critical value of ¢ with 18 degrees of freedom at 5 per cent level of 

significance is given by 1.734. The sample value of t=0.685 lies in the acceptance 
region as shown in figure below: 


Rejection 
Region 

Acceptance 
Region 


toos = 1.734 


Sample 
Value 


Rejection region for Example 10.6 


Therefore, the null hypothesis is accepted as there is not enough evidence to reject 
it. Therefore, one may conclude that the first drug is not significantly more effective 
than the second drug. 


When population variances are not equal 


In case population variances are not equal, the test statistic for testing the equality 
of two population means when the size of samples are small is given by: 


(X1-X2)- (m — M2) Ho 


t= 
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The degrees of freedom in such a case is given by the expression: 


= 2 2 
1 (s? peal! sî 
M = 1 M N — 1 No 


The procedure for testing of hypothesis remains the same as was discussed 
when the variances of two populations were assumed to be same. Let us consider 
an example to illustrate the same. 


Example 10.7: There were two types of drugs (1 and 2) that were tried on some 
patients for reducing weight. There were 8 adults who were subjected to drug 1 
and seven adults who were administered drug 2. The decrease in weight (in pounds) 
is given below: 


eee 
Drug 2 


Do the drugs differ significantly in their effect on decreasing weight? You 
may use 5 per cent level of significance. Assume that the variances of two 
populations are not same. 


Solution: 
H, : m5h, 
H : M=, 


Letus compute the sample means and standard deviations ofthe two samples 
as shown in Table 10.6. 


Table 10.6 Intermediate computations for sample means and standard deviations 


- X4) | X- X2) | X- X71)? | X2- X2? 
2 1.5625 4 
: (0) 


10.5625 
0.5625 g 


7.5625 


14. 0625 
3.0625 


Testing of Hypotheses 


NOTES 
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Testing of Hypotheses ny = 8, n = 7, 
X -2X1 _90 _44 95 Ko = e210 10 
NOTES 
2 LX -X1) _ 55.5 e 
í n -1 roe 
2 7 d(Xp - X2) = 38 -6.33 
No —1 6 
5 s? sê [7.93 6.33 
eee lee i + = J0.99 +0.90 = 1.89 =1.37 
X1—X2 M Ng 8 7 
2 9)? 2 
É s$ | 7.33 6 3 
M No 
d.f. = 5 a= 5 5 
1 | sj a 1 [s5 1 (7.33 n 1/6.33 
M —1 M No —1 Ng 7 8 6 7 
3.314 3.314 


= 12.996 = 13 (approx.) 


~ 0.12+0.136 0.12+0.136 


s- X1- X2)- (m = Me) Ho 


22 a2 
ô 6 

o1 „92 
M M 


p 11.25=10. -125 

©1437 1.37 

The table value (critical value) oft with 13 degrees of freedom at 5 per cent 

level of significance is given by 2.16. As computed t is less than tabulated ¢, there 
is not enough evidence to reject H. 


=0.912 


Check Your Progress 


4. Which type of test is used in cases where the sample size is small (n < 30) 
and is drawn from a population having a normal population with unknown 
standard deviation 6 to conduct the hypothesis for the test of mean? 

5. How is the degrees of freedom in the two sample t test for testing the 
equality of means given? 
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10.5 TESTS CONCERNING POPULATION 
PROPORTION-THE CASE OF SINGLE 
POPULATION 


We have already discussed the tests concerning population means. In the tests 
about proportion, one is interested in examining whether the respondents possess 
a particular attribute or not. 


NOTES 


The random variable in such a case is a binary one in the sense it takes only 
two values—yes or no. As we know that either a student is a smoker or not, a 
consumer either uses a particular brand of product or not and lastly, a skilled 
worker may be either satisfied or not with the present job. At this stage it may be 
recalled that the binomial distribution is a theoretically correct distribution to use 
while dealing with proportions. Further, as the sample size increases, the binomial 
distribution approaches the normal distribution in characteristic. To be specific, 
whenever both np and nq (where n = number of trials, p = probability of success 
and q= probability of failure) are at least 5, one can use the normal distribution as 
a substitute for the binomial distribution. 


The case of single population proportion 


Suppose we want to test the hypotheses, 
H, > P=P, 
H, : P¥Pp, 


For large sample, the appropriate test statistic would be: 


ge, 


p = sample proportion 
Puy Z the value of p under the assumption that null hypothesis is true 
> = Standard error of sample proportion 


The value of 5 is computed by using the following formula: 
a: [Px oho 
oo 

n 


Where, Im = l= Pn 


n Sample size 


For a given level of significance a, the computed value of Z is compared 


with the corresponding critical values, i.e. Z,,, or—Z,,, to accept or reject the null 
Self-Instructional 


Material 191 


Testing of Hypotheses hypothesis. We will consider a few examples to explain the testing procedure for 
a single population proportion. 

Example 10.8: An officer of the health department claims that 60 per cent of the 
male population of a village comprises smokers. A random sample of 50 males 
showed that 35 of them were smokers. Are these sample results consistent with 
the claim of the health officer? Use a level of significance of 0.05. 


NOTES 


Solution: 
Sample size (n) = 50 
= 35 
Sample proportion = P= - = 507 0.70 
H, : p=0.60 
H, : p> 0.60 


The test statistic is given by: 


P-P _ 
"E Ho _ 0.70-0.60 _ 0.10 arr 
z 0.069 0.069 


P, 
o_ [Paan - [8 x04 _ [0.24 _ 4 o¢g 
p n 50 50 


It is a one-tailed test. For a given level of significance a = 0.05, the critical 


value of Z is given by Z, = Z,,, = 1.645. It is seen that the sample value of 


Z= 1.44 lies in the acceptance region as shown below (see figure). 


Acceptance 
Region 


| Rejection Region 


1.44 Z,= 1.645 
(Sample Value) 


Rejection region for Example 10.8 


Therefore, there is not enough evidence to reject the null hypothesis. So it can be 
concluded that the proportion of male smokers is not statistically different from 
0.60. 
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Ee ne Leeda SE a eee a ee Oe eee Testing of Hypotheses 

10.6 TESTS FOR DIFFERENCE BETWEEN TWO ee 
POPULATION PROPORTIONS 

Here, the interest is to test whether the two population proportions are equal or NOTES 


not. The hypothesis under investigation is: 
By Pi Ps 2B Pe 
H, Sop ep, >p -p #0 


The alternative hypothesis assumed is two sided. It could as well have been 
one sided. The test statistic is given by: 


z — Pa- Pe —(P1— P2) Ho 


P,P, 
Where, 

Py = Sample proportion possessing a particular attribute from 
population 1 

Po = Sample proportion possessing a particular attribute from 
population 2 

PP = Standard error of difference between proportions. 

(P,—P>)49 = Value of difference between population proportion under 


the assumption that the null hypothesis is true. 


The formula for PLP, is given by: 


o = [PIM , P202 
R-P» ny No 


We do not know the value of p, p,, etc., but under the null hypothesis 


P,~P,~P. 
o = [Pa PO oa +2] 
P-P, ny No ny No 


The best estimate ofp is given by: 
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Where, 
x, = Number ofsuccesses in sample 1 
x, = Number ofsuccesses in sample 2 
n, = Sizeofsample taken from population 1 
n, = Size ofsample taken from population 2 
~ X eh 8 Me 
It is known that P4 =— and P2 =—~. 
nı ng 


Therefore, x4 = n4 p4 and xə = N> Po 


nı P4 + N2P2 


Therefore, Pp = 
n +n, 


Therefore, the estimate of standard error of difference between the two 
proportions is given by: 


ê = a| +++ 
R-P. 2 Mm N 
Where ô is as defined above and g = 1 — p. Now, the test statistic may 
be rewritten as: 


z = Pi- P2 —(P1 -P2)Ho 


nal 1 1 
pq| — + — 
M Ne 


Now, for a given level of significance a, the sample Z value is compared 
with the critical Z value to accept or reject the null hypothesis. We consider below 
a few examples to illustrate the testing procedure described above. 


Example 10.9: A company is interested in considering two different television 
advertisements for the promotion of a new product. The management believes 
that advertisement A is more effective than advertisement B. Two test market 
areas with virtually identical consumer characteristics are selected. Advertisement 
A is used in one area and advertisement B in the other area. In a random sample 
of 60 consumers who saw advertisement A, 18 tried the product. In a random 
sample of 100 customers who saw advertisement B, 22 tried the product. Does 
this indicate that advertisement A is more effective than advertisement B, ifa 5 per 
cent level of significance is used? 


Solution: 
H, : P, =P, 
H, : p,>P, 


na = 60, XA =18, ng = 100, XB =22 Testing of Hypotheses 
= XA 18 = XB 22 
=— =— =0.3 =— =—— =0.22 
(pa na 60 ) (po ng 100 
7 _Pa-Pe-(Pa-Pp)Ho __0.3-0.22-0 NOTES 
ae ext 1 1 
Pa—Pp pq — j 
Na Ng 
E 0.08 E 0.08 _ 0.08 4, 
7 ~ [0.25 x0.75(0.0267) 0.071 ` 
0.25% 0.75| 5 +44 y0.25 x 0.75(0.0267) 
60 100 


jo +X8 _ 18 +22 = 40 -0.25 
Nat+ng 60+100 160 


The critical value of Z at 5 per cent level of significance is 1.645. The sample 
value of Z= 1.13 lies in the acceptance region as shown in the figure below: 


Sample Value 


Acceptance 
Region Rejection 


Region 


1.13 1.645 


Rejection region for Example 10.9 


Exercises 


1. The company XYZ manufacturing bulbs hypothesizes that the life of its 
bulbs is 145 hours with a known standard deviation of 210 hours. A random 
sample of 25 bulbs gave a mean life of 130 hours. Using a 0.05 level of 
significance, can the company conclude that the mean life of bulbs is less 
than the 145 hours? 


2. The manager of a hotel is trying to decide which of the two supposedly 
equally good cigarette-vending machines to install, tests each machine 500 
times, and finds that machine I fails to work (neither delivers the cigarettes 
nor returns the money) 26 times and machine II fails to work 12 times. 
Using a 0.05 level of significance, can he conclude that two machines are 
not equally good? 


3. If54 out ofa random sample of 150 boys smoke, while 31 out of random 
sample of 100 girls smoke, can we conclude at the 0.05 level of significance 
that the proportion of male smokers is higher than that of female smokers? 
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4. Advertisements claim the average nicotine content of a certain kind of 


cigarette is 0.30 mg. Suspecting that this figure is too low, a consumer 
protection service takes a random sample of 15 of these cigarettes from 
different production lots and finds that their nicotine content has a mean of 
0.33 mg with a standard deviation of 0.018 mg. Use the 0.05 level of 
significance to test the null hypothesis u = 0.30 against the alternative 
hypothesis p > 30. 


. Inastudy of the effectiveness of physical exercise in weight reduction, a 


group of 11 persons engaged in a prescribed programme of physical exercise 
for 45 days showed the following results: 


S.No. | Weight before | Weight after | S.No. | Weight before | Weight after 

(pounds) (pounds) (Pounds) (Pounds) 
1 209 196 7 158 159 
2 178 171 8 180 180 
3 169 170 9 170 164 
4 212 207 10 153 152 
5 180 177 11 183 179 
6 192 190 


Use the 0.05 level of significance to test the null hypothesis that the 
prescribed programme of exercise is not effective in reducing weight. 


. Inadepartmental store’s study designed to test whether the mean balance 


outstanding on 30-day charge account is same in its two suburban branch 
stores, random samples yielded the following results: 


n, = 60 X, =% 6420 s, =7 1600 


n,=100 X,=7 7141 s, = Ẹ 2213 


where the subscripts denote branch store 1 and branch store 2. Use the 
0.05 level of significance to test the hypothesis against a suitable alternative. 


. A product is produced in two ways. A pilot test on 6" times from each 


method indicates that product of method 1 has sample mean tensile strength 
106 lbs and a standard deviation 12 lbs, whereas in method 2 the 
corresponding values of mean and standard deviation are 100 Ibs and 10 
Ibs respectively. Greater tensile strength in the product is preferable. Use 
an appropriate large sample test of 5 per cent level of significance to test 
whether or not method 1 is better for processing the product. State clearly 
the null hypothesis. 


. 500 units from a factory are inspected and 12 are found to be defective; 


800 units from another factory are inspected and 12 are found to be 
defective. Can it be concluded at 5 per cent level of significance that the 
production at the second factory is better than at the first factory? 


9. 


Two types of new cars produced in India are tested for petrol mileage. One 
group consisting of 36 cars averaged 14 km per litre while the other group 
consisting of 72 cars averaged 12.5 km per litre. 


(a) What test statistic is appropriate if of =1.5 & 0; =2.0? 


(b) Test, whether there exists a significant difference in the petrol 
consumption of two types of cars (use a= 0.01). 


10. Intelligence tests on two groups of boys and girls gave the following results: 
Gender Mean | Standard Deviation | Sample Size 
Girls 75 | 15 | 150 
Boys 7a | 20 | 250 
Is there a difference in the mean scores obtained by the boys and girls? Let 
the level of significance be 5 per cent. 
Check Your Progress 
6. What are the least values of np and nq for which normal distribution can be 
used as a substitute for the binomial distribution? 
7. State the assumption under which the estimate of standard error of difference 


between two sample proportion is obtained. 


10.7 ANSWERS TO CHECK YOUR PROGRESS 


QUESTIONS 


. A test is called two-sided (or two -tailed) if null hypothesis gets rejected 


when a value of the test statistics falls in either one or the other of the two 
tails of its sampling distribution. 


. The probability of committing a Type II error is denoted by beta (B). 
. Before a sample is drawn from the population, it is very important to specify 


the values of test statistic that will lead to rejection or acceptance of the null 
hypothesis. The one that leads to the rejection of null hypothesis is called 
the critical region. 


. Incase the sample size is small (n < 30) and is drawn from a population 


having a normal population with unknown standard deviation ó, a t test is 
used to conduct the hypothesis for the test of mean. 


. The degrees of freedom in the two sample t test for testing the equality of 


means is given byn, +n,—2. 


. Whenever both np and nq (where n= number of trials, p = probability of 


success and q = probability of failure) are at least 5, one can us the normal 
distribution as a substitute for the binomial distribution. 


Testing of Hypotheses 


NOTES 
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Testing of Hypotheses 7. The estimate of standard error of difference between two sample proportion 
is obtained under the assumption that null hypothesis is true. 


NOTES 10.8 SUMMARY 
e A hypothesis is a statement or an assumption regarding a population, which 
may or may not be true. 


e The sequences of steps that need to be followed for the testing of hypothesis 
are: setting up of a hypothesis, setting up of a suitable significance level, 
determination ofa test statistic, determination of critical region, computing 
the value of test-statistic and making decision 


In the test procedure for a single population mean or for examining the 
equality of two population means, for large samples, a Z test is appropriate 
whereas for the small samples, a ¢ test is used under the two cases where: (1) 
population variances are equal and (ii) population variances are not equal. 


In the testing procedures concerning the proportion ofa single population 
and the difference between two population proportions the hypotheses 
concerning them are carried out using a Z test under the assumption that the 
normal distribution could be used as an approximation to the binomial 
distribution for a large sample. 


10.9 KEY WORDS 


e Critical region: The region that leads to rejection of null hypothesis. 
e Level of significance: The probability of committing a Type 1 error. 


e Null hypothesis: The hypotheses that is proposed with the intent of 
receiving a rejection for them. 


o Type I error: This occurs when null hypothesis is rejected when it is actually 
true. 


10.10 SELF ASSESSMENT QUESTIONS AND 
EXERCISES 


Short-Answer Questions 


1. What is alternative hypothesis? 
2. Write a short note on one-tailed and two-tailed tests. 


3. Briefly explain Type I and Type II error. 
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Long-Answer Questions Testing oh Hypotheses 


. Explain the various steps involved in the tests of hypothesis exercise. 


. Indicate whether a Z or t distribution is applicable in each of the following 


cases while conducting test for population mean. 


(i) n=31 s=12 
G) n=15 s=9 
(ii) n=64 s=8 
(iv) n=28 o=10 
(v) n=56 o=6 


. The company XYZ manufacturing bulbs hypothesizes that the life of its 


bulbs is 145 hours with a known standard deviation of 210 hours. Arandom 
sample of 25 bulbs gave a mean life of 130 hours. Using a0.05 level of 
significance, can the company conclude that the mean life of bulbs is less 
than the 145 hours? 


. Average annual income of the employees of a company has been reported 


to be 718,750. A random sample of 100 employees was taken. Then average 
annual income was found to be 719,240 with a standard deviation of 2,610. 
Test at 5 per cent level of significance whether the sample results are 
representative of population results. 


. If 54 out of a random sample of 150 boys smoke, while 31 out of 


random sample of 100 girls smoke, can we conclude at the 0.05 level 
of significance that the proportion of male smokers is higher than that of 
female smokers? Use the 0.05 level of significance to test the null 
hypothesis that the prescribed programme of exercise is not effective in 
reducing weight. 


. Ina departmental store’s study designed to test whether the mean balance 


outstanding on 30-day charge account is same in its two suburban branch 
stores, random samples yielded the following results: 


n,=60 X4 =%6420 s = 21600 
n,=100 X2 =R7141 s, =%2213 
where the subscripts denote branch store | and branch store 2. Use the 


0.05 level of significance to test the hypothesis against a suitable 
alternative. 


NOTES 
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11.0 INTRODUCTION 


In the last unit, we discussed the Z test for the equality of two population proportions. 
Now, in case we have more than two populations and want to test the equality ofall 
of them simultaneously, it is not possible to do it using Z test. This is because Z test 
can examine the equality of two proportions at a time. In such a situation, the chi- 
square test can come to our rescue and can carry out the test in one go. 


The chi-square test is widely used in research. For the use of chi-square 
test, data is required in the form of frequencies. Data expressed in percentages or 
proportion can also be used, provided it could be converted into frequencies. The 
majority of the applications of chi-square (7) are with discrete data. The test 
could also be applied to continuous data, provided it is reduced to certain categories 
and tabulated in such a way that the chi-square may be applied. 


Some of the important properties of the chi-square distribution are: 
e Unlike the normal and t distribution, the chi-square distribution is not 
symmetric. 
e The values ofa chi-square are greater than or equal to zero. 


e The shape ofa chi-square distribution depends upon the degrees of freedom. 
With the increase in degrees of freedom, the distribution tends to normal. 


Chi-Square Analysis 


NOTES 
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Chi-Square Analysis 


NOTES 


There are many applications of a chi-square test. Some of them mentioned below 
will be discussed in this unit: 


e A chi-square test for the goodness of fit 
e A chi-square test for the independence of variables 


e A chi-square test for the equality of more than two population proportions. 


11.1 OBJECTIVES 


After going through this unit, you will be able to: 
e Discuss various applications of chi-square tests like: 
o achi-square test for the goodness of fit 
o achi-square test for the independence of variables 


o a chi-square test for the equality of more than two population 
proportions 


11.2 A CHI-SQUARE TEST FOR THE GOODNESS 
OF FIT 


As discussed before, the data in chi-square tests is often in terms of counts or 
frequencies. The actual survey data may be on a nominal or higher scale of 
measurement. If it is on a higher scale of measurement, it can always be converted 
into categories. The real world situations in business allow for the collection of 
count data, e.g., gender, marital status, job classification, age and income. Therefore, 
a chi-square becomes a much sought after tool for analysis. The researcher has to 
decide what statistical test is implied by the chi-square statistic in a particular 
situation. Below are discussed common principles of all the chi-square tests. The 
principles are summarized in the following steps: 


e State the null and the alternative hypothesis about a population. 
e Specify a level of significance. 


e Compute the expected frequencies of the occurrence of certain events under 
the assumption that the null hypothesis is true. 


e Make a note of the observed counts of the data points falling in different 
cells 


e Compute the chi-square value given by the formula. 


2 i i 
2 B 


F i=1 
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Where, 
O,= Observed frequency of i" cell 
E= Expected frequency of i" cell 
k= Total number of cells 
k—1 = degrees of freedom 


e Compare the sample value of the statistic as obtained in previous step with 
the critical value at a given level of significance and make the decision. 


A goodness of fit test is a statistical test of how well the observed data 
supports the assumption about the distribution of a population. The test also 
examines that how well an assumed distribution fits the data. Many a times, the 
researcher assumes that the sample is drawn from a normal or any other distribution 
of interest. A test of how normal or any other distribution fits a given data may be 
of some interest. 


Consider, for example, the case of the multinomial experiment which is the 
extension ofa binomial experiment. In the multinomial experiment, the number of 
the categories kis greater than 2. Further, a data point can fall into one of the k 
categories and the probability of the data point falling in the i" category is a constant 
and is denoted by p, where i= 1, 2, 3, 4, ..., K. In summary, a multinomial experiment 
has the following features: 


e There are fixed number of trials. 
e The trials are statistically independent. 


e All the possible outcomes ofa trial get classified into one of the several 
categories. 


e The probabilities for the different categories remain constant for each 
trial. 

Consider as an example that a respondent can fall into any one of the four 
non-overlapping income categories. Let the probabilities that the respondent will 
fall into any of the four groups may be denoted by the four parameters p, P,» P, 
and p,. Given these, the multinomial distribution with these parameters, and n the 
number of people in a random sample, specifies the probabilities of any combination 
of the cell counts. 


Given such a situation, we may use a multinomial distribution to test how 
well the data fits the assumption of k probability p, p,, ..., p, of falling into the k 
cells. The hypothesis to be tested is: 
H,: Probabilities of the occurrence of events £, E, ..., E, are given by 
the specified probabilities p, p,, ....P, 
H,: Probabilities of the k events are not the p, stated in the null hypothesis. 


Such hypothesis could be tested using the chi-square statistics. Below are 
given a set of illustrated examples. 


Chi-Square Analysis 
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Chi-Square Analysis Example 11.1: The manager of ABC ice-cream parlour has to take a decision 


regarding how much of each flavour of ice-cream he should stock so that the 
demands of the customers are satisfied. The ice-cream suppliers claim that among 
the four most popular flavors, 62 per cent customers prefer vanilla, 18 per cent 
chocolate, 12 per cent strawberry and 8 per cent mango. A random sample of 
200 customers produces the results as given below. At the a =0.05 significance 
level, test the claim that the percentages given by the supplies are correct. 


NOTES 


Flavour Vanilla Chocolate | Strawberry Mango 
Number preferring 120 40 18 22 
Solution: 
Let 
p, : proportion of customers preferring vanilla flavour. 
p, : proportion of customers preferring chocolate flavour. 
p, : proportion of customers preferring strawberry flavour. 
P,, : proportion of customers preferring mango flavour. 
H, : p,= 0.62, p, = 0.18, p, = 0.12, p „= 0.08 


H, : Proportions are not that specified in the null hypothesis 


The expected frequencies corresponding to the various flavors under the 
assumption that the null hypothesis is true are: 


Vanilla 200 x 0.62 = 124 
Chocolate 200 x 0.18 = 36 
Strawberry = 200 x 0.12 =24 

Mango = 200 x 0.08 = 16 


II 


: xe > 
The computations for yê are as under: È, At care 


E oea 
Flavour oe ae eee 


The computed value of chi-square is 4.323. 


Table x3 (5 per cent) = 9.488 (see Table 11.1) 
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Table 11.1 Some critical values of 7” for specified degrees of freedom Chi-Square Analysis 


“ous 

z 

3 

i 
5 
6 
7 
8 


9 14.684 16.919 21.666 
10 15.987 18.307 23.209 


15 22.307 24.996 30.578 


20 28.412 31.410 37.566 
21 29.615 32.671 38.932 


25 
26 
27 
28 
29 
30 


Note: For degrees of freedom greater than 30, the quantity y2x° - y2v - 1 may be used as a 
normal variate with unit variance. 


Sample Value 


Rejection 
region 
Acceptance 
region 


y 


4.323 9.488 
Critical Value 


Rejection region for Example 11.1 Self-Instructional 
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Chi-Square Analysis As sample x’ lies in the acceptance region, accept H,. Therefore, the customer 
preference rates are as stated. 


It may be worth pointing out that for the application of a chi-square test, the 
expected frequency in each cell should be at least 5.0. Further the sample 
observation should be independently and randomly taken. In case it is found that 
one or more cells have the expected frequency less than 5, one could still carry 
out the chi-square analysis by combining them into meaningful cells so that the 
expected number has a total of at least 5. Another point worth mentioning is that 
the degree of freedom, usually denoted by dfin such cases, is given by k- 1, 
where k denotes the number of cells (categories). 


NOTES 


It may be noted that in Example 11.1, the hypothesized probabilities were 
not equal. There are situations where the hypothesized probabilities in each category 
are equal or in other words, the interest is in investigating the uniformity of the 
distribution. The following example would illustrate it. 


Example 11.2: An insurance company provides auto insurance and is analysing 
the data obtained from fatal crashes. A sample of the motor vehicle deaths is 
randomly selected for a two-year period. The number of fatalities is listed below 
for the different days of the week. At the 0.05 significance level, test the claim that 
accidents occur on different days with equal frequency. 


Day Monday Tuesday Wednesday Thursday Friday Saturday Sunday 
Number of 31 20 20 22 22 29 36 
fatalities 
Solution: 
Let 


p, = Proportion of fatalities on Monday 

p, = Proportion of fatalities on Tuesday 

p, = Proportion of fatalities on Wednesday 
P, = Proportion of fatalities on Thursday 
p, = Proportion of fatalities on Friday 

P, = Proportion of fatalities on Saturday 
p, = Proportion of fatalities on Sunday 
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Chi-Square Analysis 


1 
H, : P= P= P3= Pa= Ps = Po = Pi = 5 
H, : Atleast one of these proportions is incorrect. 
n = Total frequency = 31 + 20 + 20 + 22 + 22 + 29 + 36 = 180 NOTES 


The expected number of fatalities on each day of the week under the 
assumption that the null hypothesis is true is given as under: 


Monday = 180 x + = 25.714 
Tuesday = 180 x $ = 25.714 
Wednesday = 180 x 1 =25.714 
Thursday = 180 x $ = 25.714 
Friday = 180 x 5 = 25.714 

1 
Saturday = 180 x 75 25.714 

1 
Sunday = 180 x 75 25.714 


The computation of sample chi-square value is given in the following table: 


Observed Expected O-E (O- E) (O-E)’ 
Frequencies | Frequencies ay 
(0) 
31 25.714 5.286 27.942 1.087 
20 25.714 — 5.714 32.650 1.270 


20 25.714 — 5.714 32.650 1.270 
22 25.714 — 3.714 13.794 0.536 


Total 9.233 
O-E 
The value of sample y = yí = a 9.233 
Degrees of freedom = 7-1=6 
Critical (Table) yé = 12.592 
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Since the sample chi-square value is less than the tabulated %?, there is not 
enough evidence to reject the null hypothesis as shown in the figure below. 


Rejection 
region 


Acceptance 
region 


9.233 12.592 
Sample Critical 
Chi-square Chi-square 


Rejection region for Example 11.2 


11.3 A CHI-SQUARE TEST FOR INDEPENDENCE 
OF VARIABLES 


The chi-square test can be used to test the independence of two variables each 
having at least two categories. The test makes use of contingency tables, also 
referred to as cross-tabs with the cells corresponding to a cross classification of 
attributes or events. A contingency table with 3 rows and 4 columns (as an example) 
is shown in Table 11.2. 


Table 11.2 Contingency Table with 3 Rows and 4 Columns 


Second First Classification Category 
os ae ee 
Category 
1 Ov O13 O14 Ry 
3 O31 O32 O33 O34 R3 
Total Cı C2 C3 C, n 


Assuming that there are r rows and c columns, the count in the cell 
corresponding to the i* row and the j* column is denoted by O; wherei= 1,2, 
~. r and j = 1, 2, ..., c. The total for row i is denoted by R, whereas that 
corresponding to column j is denoted by C. The total sample size is given by n, 
which is also the sum ofall the r row totals or the sum ofall the c column totals. 


The hypothesis test for independence is: 
H,: Row and column variables are independent of each other. 
H, : Row and column variables are not independent. 


The hypothesis is tested using a chi-square test statistic for independence given by: 


i=1 j=1 ij 


The degrees of freedom for the chi-square statistic are given by (7— 1) 
(c-1). 

For a given level of significance a, the sample value of the chi-square is 
compared with the critical value for the degree of freedom (r— 1) (c— 1) to make 
a decision. 

The expected frequency in the cell corresponding to the i* row and the j" 
column is given by: 


Where, R,= Total for the i* row 
C,= Total for the j* column 
n= Total sample size. 

Let us consider a few examples: 


Example 11.3: A sample of 870 trainees was subjected to different types of 
training classified as intensive, good and average and their performance was noted 
as above average, average and poor. The resulting data is presented in the table 
below. Use a 5 per cent level of significance to examine whether there is any 
relationship between the type of training and performance. 


Training 
Performance 
Above average 40 
Average 
Poor 
Total 


Solution: 


H, : Attribute performance and the training are independent. 

H, : Attribute performance and the training are not independent. 

The expected frequencies corresponding the i" row and the j" column in 
the contingency table are denoted by Ep where i= 1,2,3 andj = 1,2,3. 


290 x 250 


Ea = SOS = 83.33 
Ba = = = 110.00 
Ea = = = 96.67 
E = SS = 86.21 
p p= 19-79 


2,2 870 
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E, = X= 100.00 
E = SOX 90.46 
E,, = = 106.21 
E, = SORE = 93,33 


The table of the observed and expected frequencies corresponding to the 
i" row and the j" column and the computation of the chi-square are given in the 
table below. 


3,1 927.81 
3,2 686.96 
3,3 3211.49 


=e 


t ë (O; 
Sample x2 = ©)» E = 107.39 
i=1 j=1 ij 
The critical value ofthe chi-square at 5 per cent level of significance with 4 
degrees of freedom is given by 9.49. The sample value ofthe chi-square falls in 
the rejection region as shown in the figure below. 


Rejection 
region 


Acceptance 
region 


| 
949 497.39 


Critical 
Value 


Sample 
Chi-square 


Rejection region for Example 11.3 


Therefore, the null hypothesis is rejected and one can conclude that there is 
an association between the type of training and performance. 


Example 11.4: The following table gives the number of good and defective parts 
produced by each of the three shifts in a factory: 


Shift Good Defective Total 
Day 
Evening 
Night 
Total 


Is there any association between the shift and the equality of the parts 
produced? Use a 0.05 level of significance. 


Solution: 
H,: There is no association between the shift and the quality of parts 
produced. 
H,: There is an association between the shift and quality of parts. 


The computations of the expected frequencies corresponding to the i" row 
and the j" column of the contingency table are shown below: (i= 1, 2,3) and (j = 
1,2). 


1,030 x 2,000 _ 
Ei 2500 824 
a 1,030 x500 Je 
12 2500 — 
870 x2,000 _ 
E, = — 2500 = 696 
_ 870x500 E 
E = -z500 ~l 
600 x 2,000 
E, = 2.500 = 480 
uE 600 x 500 Sinn 
32 2500 


The table of the observed and expected frequencies corresponding to the 
i" row and the j column and the computation of the chi-square is given below: 


Chi-Square Analysis 


Chi-Square Analysis 


3 2 SE 
The sample chi-square is %? = yr E = 101.83 
i=1 j=1 ii 
The critical value of the chi-square with 2 degrees of freedom at 5 per cent 
level of significance is given by 5.991. The null hypothesis is rejected as the sample 
chi-square lies in the rejection region as shown in the figure below. Therefore, the 
quality of parts produced is related to the shifts in which they were produced. 


NOTES 


Rejection 


Acceptance region 


region 


| 
5.991 | 101.83 


Critical Sample 


Chi-square Chi-square 
Rejection region for Example 11.4 


It may be worth mentioning again that for the application ofa chi-square 
test of independence, the sample should be selected at random and the expected 
frequency in each cell should be at least 5. 


Check Your Progress 


1. What is a goodness of fit test? 

2. State the minimum expected frequency in each cell required for the 
application of a chi-square test. 

3. What is the assumption under which the expected frequencies in a cross 
table are computed? 


4. Can the chi-square test of independence be applied in a case where the 
sample is selected by criteria and the expected frequency in each cell is at 
least 5? 


11.4 A CHI-SQUARE TEST FOR THE EQUALITY 
OF MORE THAN TWO POPULATION 
PROPORTIONS 


In certain situations, the researchers may be interested to test whether the 
proportion of a particular characteristic is the same in several populations. The 
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same for the three age groups — 25 and under, over 25 and under 50, and 50 and 
over. To take another example, the interest may be in determining whether in an 
organization, the proportion of the satisfied employees in four categories—class I, 
class II, class II and class IV employees—is the same. Ina sense, the question of 
whether the proportions are equal is a question of whether the three age populations 
of different categories are homogeneous with respect to the characteristics being 
studied. Therefore, the tests for equality of proportions across several populations 
are also called tests of homogeneity. 


The analysis is carried out exactly in the same way as was done for the 
other two cases. The formula for a chi-square analysis remains the same. However, 
two important assumptions here are different. 

(i) We identify our population (e.g., age groups or various class 
employees) and the sample directly from these populations. 

(it) As we identify the populations of interest and the sample from them 
directly, the sizes of the sample from different populations of interest 
are fixed. This is also called a chi-square analysis with fixed marginal 
totals. The hypothesis to be tested is as under: 

H,: The proportion of people satisfying a particular characteristic is 
the same in population. 


H,: The proportion of people satisfying a particular characteristic is 
not the same in all populations. 


The expected frequency for each cell could also be obtained by using the 
formula as explained earlier. There is an alternative way of computing the same, 
which would give identical results. This is shown in the following example: 


Example 11.5: An accountant wants to test the hypothesis that the proportion of 
incorrect transactions at four client accounts is about the same. A random sample 
of 80 transactions of one client reveals that 21 are incorrect; for the second client, 
the number is 25 out of 100; for the third client, the number is 30 out of 90 
sampled and for the fourth, 40 are incorrect out ofa sample of 110. Conduct the 
test at a = 0.05. 


Solution: 


Let p, = Proportion of incorrect transaction for 1* client 
p, = Proportion of incorrect transaction for 2™ client 
p, = Proportion of incorrect transaction for 3" client 
P, = Proportion of incorrect transaction for 4" client 
Let Hy: P= P,P; =P, 
H, : All proportions are not the same. 


Chi-Square Analysis 
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Chi-Square Analysis The observed data in the problem can be rewritten as: 
Transactions Client 1 Client 2 Client 3 Client 4 Total 
Incorrect transactions 21 25 30 40 116 
NOTES Correct transactions 
Total 


An estimate of the combined proportion of the incorrect transactions under 
the assumption that the null hypothesis is true: 


21+25+30+40 116 
P ~ 80+100+90+110 380 


= 0.305 


q = combined proportion of the correct transaction 
= 1-p=1-0.305 = 0.695 


Using the above, the expected frequencies corresponding to the various 
cells are computed as shown below: 


Transactions Client 1 Client 2 Client 3 Client 4 Total 


Incorrect 
transactions 80 x 0.305 = 24.4 | 100 x 0.305 = 30.5 | 90 x 0.305 = 27.45 | 110 x 0.305 = 33.55 | 115.9 


Correct 
transactions 80 x 0.695 = 55.6 | 100 x 0.695 = 69.5 | 90 x 0.695=62.55 | 110 x 0.695 = 76.45 | 264.1 


Total 80 100 90 110 380 


In fact, the sum of each row/column in both the observed and expected 
frequency tables should be the same. Here, a bit of discrepancy is found because 
of the rounding of the error. It can be easily verified that the expected frequencies 


XC; 


in each cell would be the same using the formula as E, = already explained. 


Now the value of the chi-square statistic can be calculated as: 


i yo =6,) E (21-24.4) P (25 —30.5)° 3 (30 — 27.45) „(40 - 33.55)" 
XS eg 24.4 30.5 27.45 33.55 


(59-55.6)* (75—69.5)* (60—62.55)* (70—76.45)" 
55.6 69.5 62.55 76.45 
= 0.474 + 0.992 + 0.237 + 1.240 + 0.208 + 0.435 + 0.104 + 0.544 
= 4.234 
Degrees of freedom (df) = (2 — 1) x (4-1) =3 


The critical value of the chi-square with 3 degrees of freedom at 5 per cent 
level of significance equals 7.815. Since the sample value of y? is less than the 
critical value, there is not enough evidence to reject the null hypothesis. Therefore, 
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the null hypothesis is accepted. Therefore, there is no significant difference in the Chi-Square Analysis 
proportion of incorrect transaction for the four clients. 


Exercises 


1. A sample analysis of the examination results of 200 MBA students was NOTES 


done. It was found that 46 students had failed, 68 had secured a third 
division, 62 had secured a second division and the rest obtained first division. 
Are these figures commensurate with the general examination result, which 
is in the ratio of 2: 3 : 3 : 2 for various categories respectively? 


2. Of the 1000 workers in a factory exposed to an epidemic, 700 in all were 
attacked, 400 had been inoculated and of these, 200 were attacked. On 
the basis of this information can it be said that the inoculation and attack are 
independent? 


3. The following figures show the distribution of the digits innumbers chosen 
at random from a telephone directory: 


Digt | o | 1 2/3 [4]5]{6 | 7 | 8 {9 | Tota | 
Frequency | 1026 | 1107 | 997 | 966 | 1075 | 933 | 1107 | 972 | 964 | 853 | 10,000 | 


Test whether the digits may be taken to occur equally in the directory. 
4. The number of automobile accidents per week in a certain city was as 
follows: 
12, 8, 20, 2, 14, 10, 15, 6, 9, 4 
Are these frequencies in agreement with the belief that the accident conditions 
were the same during the 10-week period? 


5. The divisional manager ofa retail chain believes that the average number of 
customers entering each of the five stores in his division weekly is the same. 
Ina given week, a manager reports the following number of customers in 
the stores: 


3000, 2960, 3100, 2780, 3160 
Test the divisional manager’s belief at a 10 per cent level of significance. 


6. A cigarette company interested in the relation between sex ofa person and 
the type of cigarettes smoked has collected the following data from a random 


sample of 150 persons: 
Cigarette Male Female Total 
A 25 30 55 
B 40 15 55 
C 30 10 40 
Total | 95 | 55 | 150 | 


Test whether the type of cigarette smoked and the sex are independent. 
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5. 


6. 


Check Your Progress 


What is another name for tests for equality of proportions across several 
populations? 


Mention the degrees of freedom for chi-square test, if there are 3 rows 
and two columns. 


11.5 ANSWERS TO CHECK YOUR PROGRESS 
QUESTIONS 


11.6 


. A goodness of fit test is a statistical test of how well the observed data 


supports the assumption about the distribution of a population. 


. For the application of a chi-square test, the expected frequency in each cell 


should be at least 5.0. 


. The expected frequencies in a cross table are computed under the 


assumption that null hypothesis is true. 


. No, for the application of a chi-square test of independence, the sample 


should be selected at random and the expected frequency in each cell should 
be at least 5. 


. The tests for equality of proportions across several populations are also 


called tests of homogeneity. 


. If there are 3 rows and two columns, the degrees of freedom for chi-square 


test is 3. 


SUMMARY 


Chi-square test has a variety of applications in research. Chi-square is non- 
symmetrical distribution taking non-negative values. 

It can be used to test the goodness of fit of a distribution, independence of 
variables and equality of more than two population proportions. 

A necessary condition for the application of chi-square test is that the 
expected frequency in each cell should be at least 5. 

The first and foremost thing for the application of chi-square is the 
computation of expected frequencies. 


The data in chi-square test is in terms of counts or frequencies. In case the 
actual data is on a scale higher than that of nominal or ordinal, it can always 
be converted into categories. 


11.7 KEY WORDS 


o Degrees of freedom: These are given by (7—1) x (c—1) for a contigency 
table. 


e Chi-square distribution: This is a non-symmetric distribution taking only 
non-negative values. 


e Non-symmetric distribution: Those distributions that are skewed towards 
any one tail of the distribution. 


11.8 SELF ASSESSMENT QUESTIONS AND 
EXERCISES 


Short-Answer Questions 


1. What is chi-square test of the goodness of fit? What precautions are 
necessary while applying this test? Point out its role in business decision- 
making. 

2. List the principles of the chi-square test. 


3. What are the features of a multinomial experiment? 
Long-Answer Questions 


1. What is a y’test? Point out its applications. Under what conditions is this 
test applicable? 


2. A cigarette company interested in the relation between sex ofa person and 
the type of cigarettes smoked has collected the following data from a random 


sample of 150 persons: 
Cigarette Male Female Total 
A 25 30 55 
B 40 15 55 
C 30 10 40 
Total 95 55 150 


Test whether the type of cigarette smoked and the sex are independent. 


3. A survey was carried out in a state among the doctors belonging to the rural 
health service cadre (500 doctors) and among the medical education 
directorate cadre (300 teaching doctors). They were asked a question, 
‘Would it be acceptable to you, ifthe government proposes to hire all the 
doctors on a fixed period contractual basis?’ The doctors were to answer 
either as ‘Acceptable’ or ‘Not Acceptable’. There was no third category 
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‘Undecided’. The following was the data compiled in a cross-tabulated 


format: 
Doctors Acceptable Not Acceptable Total 
Rural Cadre 195 305 500 
Teaching Cadre 
Total 


Test an appropriate hypothesis using a 5 per cent level of significance. 


4. The following figures show the distribution of the digits in numbers chosen 
at random from a telephone directory: 


Digit 0 1 2 3 4 5 6 7 8 9 Total 
Frequency | 1,026 | 1,107 | 997 966 | 1,075 | 933 | 1,107 | 972 964 853 | 10,000 


Test whether the digits may be taken to occur equally in the directory. 
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12.0 INTRODUCTION 


In Unit 10, we discussed the test of hypothesis concerning the equality of two 
population means using both the Z and t tests. However, if there are more than 
two populations, the test for the equality of means could be carried out by 
considering two populations at a time. This would be a very cumbersome 
procedure. One easy way out could be to use the Analysis of Variance (ANOVA) 
technique. The technique helps in performing this test in one go and, therefore, is 
considered to be important technique of analysis for the researcher. Through this 
technique it is possible to draw inferences whether the samples have been drawn 
from populations having the same mean. 


The technique has found applications in the fields of economics, psychology, 
sociology, business and industry. It becomes handy in situations where we want to 
compare the means of more than two populations. Some examples could be to 
compare: 


e the mean cholesterol content of various diet foods 
e theaverage mileage of, say, five automobiles 


e theaverage telephone bill of households belonging to four different income 
groups and so on. 


RA Fisher developed the theory concerning ANOVA. The basic principle 
underlying the technique is that the total variation in the dependent variable is 
broken into two parts—one which can be attributed to some specific causes and 
the other that may be attributed to chance. The one which is attributed to the 
specific causes is called the variation between samples and the one which is attributed 
to chance is termed as the variation within samples. Therefore, in ANOVA, the 
total variance may be decomposed into various components corresponding to the 
sources of the variation. 
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In ANOVA, the dependent variable in question is metric (interval or ratio 
scale), whereas the independent variables are categorical (nominal scale). If there is 
one independent variable (one factor) divided into various categories, we have one- 
way or one-factor analysis of variance. In the two-way or two-factor analysis of 
variance, two factors each divided into the various categories are involved. 


In ANOVA, it is assumed that each of the samples is drawn from a normal 
population and each of these populations has an equal variance. Another assumption 
that is made is that all the factors except the one being tested are controlled (kept 
constant). Basically, two estimates of the population variances are made. One 
estimate is based upon between the samples and the other one is based upon 
within the samples. The two estimates of variances can be compared for their 
equality using F statistic. 


12.1 OBJECTIVES 


After going through this unit, you will be able to: 
e Explain the meaning and assumptions of conducting analysis of variance 
e Describe completely randomized design 
e Describe the randomized block design in two-way analysis of variance 


e Explain a factorial design 


12.2 COMPLETELY RANDOMIZED DESIGN IN A 
ONE-WAY ANOVA 


Completely randomized design involves the testing of the equality of means of two 
or more groups. In this design, there is one dependent variable and one independent 
variable. The dependent variable is metric (interval/ratio scale) whereas the 
independent variable is categorical (nominal scale). Asample is drawn at random 
from each category of the independent variable. The size of the sample from each 
category could be equal or different. Let us consider a few examples to illustrate a 
one-way analysis of variance. 


Example 12.1: Suppose we want to compare the cholesterol contents of the four 
competing diet foods on the basis of the following data (in milligrams per package) 
which were obtained for three randomly taken 6-ounce packages of each of the 
diet foods: 


Diet Food A 3.6 4.1 4.0 
Diet Food B 3.1 3.2 3.9 
Diet Food C 3.2 3.5 3.5 
Diet Food D 3.5 3.8 3.8 


We want to test whether the difference among the sample means can be Analysis of Variance 
attributed to chance at the 5 per cent level of significance. 


Solution: As explained earlier, the total variation in the data set can be expressed 
as asum of the variations that can be attributed to specific sources (in this example, 
the various diet foods) plus the one which is attributed due to chance. The total 
variation in the data set is called the total sum of squares (TSS) and is computed 
as: 


NOTES 


k n 1 
TSS = O ra 


ESES 
Where, (i=1,... kandj=1, 2,....n) 
x „=the j* observation of the i sample (diet food) 
T., = Grand total of all the data 
k =4 (Number of diet foods) 


n =3 (number of observations in each sample) 


1 ; : Ao 
The term A e T? is referred to as the correction factor. The variation between 


the sample means which is attributed to specific sources or causes is referred to as 
the treatment sum of squares (TrSS). This is computed using the following formula: 


1 1 
Soy pE 
TrSs ne Sea 


Where, 7;, = Total of observations for the ith treatment. 


The variation within the sample, which is attributed to chance, is referred to 
as the error sum of squares (SSE). This could be computed by subtracting the 
treatment sum of squares from the total sum of squares. This is shown as: 


SSE = TSS —TrSS 


n 


55x Lr D a 
= i kn oo | ne ie kn . 


i=1j=1 
In order to test the null hypothesis, 
Hy? M, = Hy = He = Hp 
against the alternative hypothesis 


H, : Atleast two means are not equal 


(Treatment means are not equal) 


We test the equality of TrSS with SSE. The necessary workings required 
for this are presented in Table 12.1, which is called one-way analysis of the variance 
table. The first column of the table indicates the sources of variation. The second 
column lists the degrees of freedom. There are k treatments; therefore the 
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Analysis of Variance corresponding degrees of freedom are k — 1. Similarly, the total number of 
observations in the data set is An and therefore, the corresponding degrees of 
freedom are kn— 1. The degrees of freedom for errors are obtained by subtracting 
from the total degrees of freedom, the degrees of freedom corresponding to the 

NOTES treatment, i.e., (An — 1)-— (k— 1) =k (n—1). The third column lists the sum of 
squares due to the various sources of variation. The fourth column lists the mean 
square due to treatment MSTr = (TrSS/A—1) and the mean square due to error 
MSE =(SSE/A(n-1)) obtaining by dividing the corresponding sum of squares by 
their degrees of freedom. The last column indicates the F statistic given as the ratio 
of the two mean squares with kK—1 degrees of freedom for the numerator and k(n- 
1) degrees of freedom for the denominator. For a given level of significance, the 
computed F statistic is compared with the table value of F with k—1 degrees of 
freedom in the numerator and A(n—1) degrees of the freedom for the denominator. 
If the computed F value is greater than the tabulated F value, the null hypothesis is 
rejected. 


Table 12.1 One-way ANOVA 


Sources of Degrees of Sum of 
Variation Freedom Squares 


Treatments k-1 TrSS 
(Diet food) 


Error k(n- 1) SSE 


Total kn- 1 TSS 


The required computations in case of Example 12.1 are given below: 
k=4n=3 


3.6 + 4.1 + 4.0 + 3.1 + 3.2 + 3.9 + 3.2 + 3.5 + 3.5 + 3.5 + 3.8 + 3.8 


3.6 + 4.1 + 4.0 


3.1 + 3.2 + 3.9 


(3.6) + (41% + ( (4.0) + ( (3.1) + (3.2)° + (3.9)° + (3.2) + (3.5) + 
(3.5)? + (3.5)? + (3.8)? + (3.8)° 


4 3 
s = $ $ x? - Lor? = 156.70- (43.2) =1.18 
kn 12 


i=jj=1 


4 
1 
aA a T? 
TSS = 5 m’ 
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1 1 
= g [IL7?+ 10.2? + 0j 5 43.27 =0.54 


SSE = TSS-—TrSS 
= 1.18 — 0.54 = 0.64 


The above results corresponding to Example 12.1 could be set up in the 


ANOVA Table 12.2. 
Table 12.2 ANOVA Table for Example 12.1 
Sources of Degrees of Sum of Mean Square 3 
Variation Freedom Squares F 

Treatments 3 0.54 0.18 .25 
(Diet Food) 
Error 8 0.64 0.08 
Total 11 1.18 


Analysis of Variance 


NOTES 


Assuming the level of significance to be 5 per cent, the table value of F with 
3 degrees of freedom in the numerator and 8 degrees of freedom in the denominator 
equals 4.07 (See Table 12.3). Since the computed F is less than the tabulated F, 
there is not enough evidence to reject the null hypothesis. Therefore, the difference 
in the cholesterol contents in the four diet foods could be attributed to chance. 


A mentioned earlier, the size of the sample from each category (treatment) 
need not be same. If there are ni observations corresponding to i” treatment, the 
computing formula for the sum of squares would look like: 


k ni 1 
TSS =Z LN- p E 


j=jj=t 
k 2 1 


> Ti. 2 
= a 
TrSS an N a 


SSE =TSS-—TrSS 
Where, N =n Phy +.... +n, 


The total number of degrees of freedom in the case is N — 1, and the degrees 
of freedom are k— 1 for the treatments and N — k for the error. Let us consider an 
example. 
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Table 12.3(a) Significance points of the variance-ratio ‘F’ 5 per cent points of F 


10.13 


9.55 


9.28 


9.12 


9.01 


8.94 


8.84 


8.74 


8.64 


161.4 | 199.5 | 215.7 | 224.6 | 230.2 | 234.0 | 238.9 | 243.9 | 249.0 | 254.3 
18.51 | 19.00 | 19.16 | 19.25 | 19.30 | 19.33 | 19.37 | 19.41 | 19.45 | 19.50 


8.53 


7.71 


6.94 


6.59 


6.39 


6.26 


6.16 


6.04 


5.91 


5.77 


5.63 


6.61 


4.96 


5.79 


4.10 


5.41 


3.71 


5.19 


3.48 


5.05 


3.33 


4.95 


3.22 


4.82 


3.07 


4.68 


2.91 


4.53 


2.74 


4.36 


2.54 


4.84 


3.98 


3.59 


3.36 


3.20 


3.09 


2.95 


2.79 


2.61 


2.40 


4.75 
a 


3.88 
Ea 


3.49 


3.26 


3.11 


3.00 


2.85 


2.69 


2.50 
a2 


2.30 
2 


ma 
| 4.54 | 


a2 
| 3.68 | 


3.41 3.18 3.02 3.92 2.77 2.60 
3.34 3.11 2.96 2.85 2.70 2.53 


| 3.29 | 


| 3.06 | 


| 2.90 | 


| 2.79 | 


| 2.64 | 


| 2.48 | 


Ea 
| 2.29 | 


2 
| 2.07 | 


|441 | 41 
= 


| 3.55 | 55 
cy 


| 3.16 | 16 
E 


| 2.93 | 93 
28 


| 2.77 | 77 
EA 


| 2.66 | 66 
2s 


| 2.51 | 51 
EN 


| 2.34 | 34 
EEJ 


| 2.15 | 15 
2s 


EEJ EA a EI Eg EA E Ea EA EA 
4.45 3.59 | 3.20 2.96 2.81 3.70 2.55 2.38 2.19 1.96 


| 1.92 | 92 
EA 


4.35 


3.49 


3.10 


2.87 


2.71 


2.60 


2.45 


2.28 


2.08 


1.84 


4.32 


4.22 


3.47 


| 3.38 | 
EJ 


3.07 


| 299 | 


2.84 


| 276 | 


2.68 


| 260 | 


2.57 


| 249 | 


2.42 


| 2.34 | 


2.25 


| 246 | 


2.05 


| 1.96 | 
EA 


1.81 


|171 | 
EEN 


4.21 
4.20 


3.35 
3.34 


2.95 


2.71 


2.56 


2.44 


2.29 


2.12 


1.93 
1.91 


222 Ea 222 22 EEJ Ea 
2.96 2.73 2.57 | 2.46 2.30 2.13 


1.67 
1.65 


4.18 


3.33 


2.93 


2.70 


2.54 


2.43 


2.28 


2.10 


1.90 


1.64 


4.17 


3.32 


2.92 


2.69 


2.53 


2.42 


2.27 


2.09 


1.89 


1.62 


40 4.08 


3.23 


2.84 


2.61 


2.45 


2.34 


2.18 


2.00 


1.79 


1.51 


oo 
: 


v, = Degrees of freedom for greater variance. 


v, = Degrees of freedom for smaller variance. 
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Table 12.3(b) Significance points of the variance-ratio ‘F’1 per cent points of F Analysis of Variance 


1 2 3 4 5 6 8 12 24 o) 
4052 | 5000 | 5403 | 5625 | 5764 | 5859 | 5982 | 6106 | 6235 | 6366 NOTES 


34.12 | 30.82 : g 28.24 | 27.91 | 27.49 | 27.05 | 26.60 | 26.13 
21.20 | 18.20 ; : 15.52 | 15.21 | 14.80 | 14.37 | 13.93 | 13.45 
16.26 | 13.27 : 10.97 | 10.67 | 10.29 9.89 9.47 9.02 
12.25 9.55 3 7.46 7.19 6.84 6.47 6.07 5.65 
11.26 8.65 ; P 6.63 6.37 6.03 5.67 5.28 4.86 
10.56 8.02 ; : 6.06 5.80 5.47 5.12 4.73 4.31 


9.33 6.93 $ : 5.06 4.82 4.50 4.16 3.78 3.36 


9.07 6.70 : ; 4.86 4.62 4.30 3.96 3.59 3.17 
8.86 6.51 ; : 4.69 4.46 4.14 3.80 3.43 3.00 


8.53 6.23 5.29 4.77 4.44 4.20 3.89 3.55 3.18 2.75 
8.40 6.11 5.18 4.67 4.34 4.10 3.79 3.46 3.08 2.65 


8.29 6.01 5.09 4.58 4.25 4.01 3.71 3.37 3.00 2.59 
8.02 5.78 k : 4.04 3.81 3.51 3.17 2.80 2.36 


7.95 5.72 ; i 3.99 3.76 3.45 3.12 2.75 2.31 
7.88 5.66 : : 3.94 3.71 3.41 3.07 2.70 2.26 


EA 4.18 3.85 3.63 3.32 2.99 2.62 2.17 
EJ 4.07 3.75 3:53 3.23 2.90 2.52 2.06 
EEE 3.83 3.51 3.29 2.99 2.66 2.29 1.80 


4.13 | 3.65 | 3.34 | 3.12 | 2.82 | 2.50 | 2.12 | 1.60 
3.95 | 3.48 | 3.17 | 2.96 | 2.66 | 2.34 | 1.95 | 1.38 


v, = Degrees of freedom for greater variance. 
v, = Degrees of freedom for smaller variance. 
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Example 12.2: The following are the number of kilometres/litre which a test driver 
with three different types of cars has obtained randomly on different occasions. 


e e pe pepe 


ca [e [es [ee ee e | 


Car 3 12.8 |13.2 |12.7 |12.6 |12.9 


Using a 5 per cent level of significance, perform a one-way ANOVA to 
examine the hypothesis that the difference in the average mileage in the three types 
of cars can be attributed to chance. 


Solution: 
H, : 4 =H, = u, (Average mileage in the three types of cars is the same.) 
H, : At least two types of cars do not have the same mileage. 
K=3,n =4,n,=5,n,=6 
N=n +n, +n,=4+5+6=15 


15 + 14.5 + 14.8 + 14.9 + 13 + 12.5 + 13.6 + 13.8 + 14 + 12.8 
+ 13.2 + 12.7 + 12.6 + 12.9 + 13 


T, |=| 15 +14.5 + 14.8 + 14.9 =| 59.2 
To |=| 13 +12.5 + 13.6 + 13.8 + 14 =| 66.9 
T3. | =| 12.8 + 13.2 +12.7 + 12.6 + 12.9 + 13 =| 77.2 
3 8 (15)? + (14.5)? + (14.8)? + (14.9)? + (13) + (12.5)? + (13.6)? + 
YG | = | (13.8)? + (14) + (12.8)? + (13.2)? + (12.7)? + (12.6) + (12.9) | = | 2766.49 
i=j j=1 + (13% 
3 ni 5 1 7 
TSS = 2% -4°7- 


= 2766.49- -L (203.3)2 
15 


= 2766.49 -2755.393 
= 11.097 


3 T2 1 
TrSS = D 


ee ee | 


2z 66.92 77.2? 
+ + 


1 (203.3) 
4 5 6 15 


= 2764.5886 -— 2755.3926 


= 9.196 

SSE = TSS —TrSS 
= 11.097 -9.196 
= 1.901 


The ANOVA table in the case of Example 12.2 can be set up as shown in 
Table 12.4. 


Table 12.4 One-way ANOVA for Example 12.2 


Sources of Degrees of Sum of Mean Square FZ 
Variation Freedom Squares 
Treatments 2 9.196 4.598 29.02 


(Between groups) 


12 
14 


1.901 
11.097 


Error (within groups) 0.158 


Total 


The computed F statistics equals 29.02. The table value of F with 2 degrees 
of freedom in the numerator and 12 degrees of freedom in the denominator at a 5 
per cent level of significance is given by 3.89. As the computed F statistic is 
greater than the table F value, the null hypothesis is rejected. Therefore, the average 
mileage in these types of cars is statistically different. 


Check Your Progress 
1. What is the nature of the dependent and independent variables in a 
completely randomized design? 


2. What does the fourth column ofa one-way analysis of the variance table 
denote? 


12.3 RANDOMIZED BLOCK DESIGN IN TWO-WAY 
ANOVA 


In Example 12.1, it could not be shown that there really is a significant difference 
in the average cholesterol content of the four diet foods. The results were not 
statistically different because there was a considerable difference in the values 
within each of the samples resulting in a large experimental error. However, if we 
have additional information that each of the value was randomly measured in the 
three different laboratories in such a way that the first value of each sample came 
from laboratory 1, the second value from laboratory 2, and the third value from 
laboratory 3 (the random assignment of test units to labs). In such a case, a two- 
way analysis of variance is suggested. We had earlier partitioned the total sum of 


Analysis of Variance 


NOTES 


Self-Instructional 
Material 


227 


Analysis of Variance 


NOTES 


Self-Instructional 


228 Material 


squares into two components—one which is due to the differences between the 
sample (treatment sum of squares) and the other one due to the differences within 
the samples (error sum of squares). Now, this error sum of square includes the 
sum of squares due to laboratories (called blocks) as an extraneous factor. In 
two-way analysis of variance, we remove the effect of the extraneous factors 
(laboratories or blocks) from the error sum of squares. Therefore, the total sum of 
square is partitioned into three components—one due to treatment, second due to 
block and the third one due to chance (called the error sum of squares). It may be 
noted that the total sum of squares (TSS) and the treatment sum of squares (TrSS) 
would remain the same as computed earlier in Example 12.1. In addition, we will 
have another component called block sum of squares (SSB) which is due to 
different laboratories and is computed as: 


1< 1 
= -eY T2 -—eT? 
SSB k ej ie oe 


Where, 7), = Total of the values in thej t block. 
The error sum of squares would be computed as: 


SSE = TSS —TrSS — SSB 
There will be two hypotheses to be tested: 


I (Diet Food) 
Hy? Hy = Mg = He = Mp 


H, : At least the two means are not same. 


II (Blocks or Labs) 


H, E vT =V 


(Average cholesterol content in the three labs is same.) 


H, : Atleast two means are not same. 


Now, we would need to test the equality of TrSS with SSE and SSB with 
SSE. The necessary working required for this are presented in Table 12.5 called 
two-way analysis of variance table. 


Table 12.5 Two-way ANOVA 


Sources of Degrees of Sum of 
Variation Freedom Squares Mean Square 


Treatments k1 MSTr 
(k-1)(n-1) a MSE 


ni MSB 


(k-1X%n-) MSE 


The various columns of the above table are filled up in the same fashion as 
was done for Table 12.1. Example 12.1 can be rewritten as Example 12.3. 


Example 12.3: Suppose in Example 12.1, the measurement of the cholesterol 
content was performed in three different laboratories. The first value of each 
sample came from one laboratory, the second value came from another 
laboratory, and the third value came from a third laboratory. The data is presented 
below: 


Laboratory 
Diet Food One Two 


Diet Food A : 4.1 


Diet Food B : 3.2 
Diet Food C ; 3.5 
Diet Food D 3.5 3.8 3.8 


Perform a two-way ANOVA using a 0.05 level of significance. 


Solution: There will be two hypotheses to be tested in this case; one corresponding 
to the treatment (diet food) and the other corresponding to laboratories (blocks). 
These are listed below: 

I (Diet Food) 


Heo = Hg = Ho = Hp (Average cholesterol content of the four diet 
foods is same.) 


H, : Atleast two means are not same. 
II (Blocks or labs) 
H, : V,=V,= v, (Average cholesterol content in the three labs is 
same.) 
H, : Atleast two means are not same. 


1 
The TSS and TrSS here would be the same as computed in Example 12.1. As 
mentioned earlier, the block sum of square would be required in this problem 
using the formula: 


n 


SSB = LST eT, 
j=1 


AL 

kn 
Where, Te =Total of the values in the j block. 
The error sum of squares would be obtained as 


SSE = TSS —TrSS — SSB 


Analysis of Variance 
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The required computations for the two-way ANOVA are as under: 
T, = 3.64+3.14+3.2+3.5 = 13.4 
T, = 414+3.2+3.5+43.8= 14.6 
T, = 4.0 +3.9 +3.5+3.8 = 15.2 


= $ [13.4 +14.6 ++ 15.2]-5 (43.2) 


= 155.94 — 155.52 
= 0.42 
We have already computed in Example 12.1, the values of TSS and TrSS as under: 
TSS = 1.18, TrSS = 0.54 
Therefore, SSE = TSS —TrSS—SSB 
= 1.18 — 0.54 — 0.42 
= 0.22 


We note that the SSE in Example 12.1 was 0.64, whereas here it is 0.22. 
This is because the earlier SSE has been partitioned into two components, namely, 
the block sum of squares (SSB) having a value of 0.42 resulting in 0.22 as the new 
error sum of squares (SSE). The required results for the testing of the two 
hypotheses are presented in the ANOVA Table 12.6. 


Table 12.6 Two-way ANOVA Table for Example 12.3 


Degrees of | Sum of | Mean 
Sources of Variation| Freedom |Squares| Square F 
Treatments (Diet 3 0.54 0.18 i 0.18 
Foo e ~ 0.0367 _ 
Block (Laboratories) 2 0.42 0.21 5 0.21 

f = 0 o3e7 ~ 

Error (Chance) 6 0.22 | 0.0367 
Total 11 1.18 


The table value of Fe and Fé ata 5 per cent level of significance is given by 4.76 


and 5.14 respectively. The corresponding sample F values for both are 4.90 and 
5.72. Since the computed F values are greater than the corresponding table values, 
the null hypothesis is rejected in both the cases. Therefore, it can be concluded 
that there is a difference in the average cholesterol content due to various diet 
foods and because of the laboratories where the measurements were taken. 


Check Your Progress 


3. What are the three components under which the total sum of square is 
partitioned? 


4. Which variable’s effect is removed in a randomized block design? 


12.4 FACTORIAL DESIGN 


In factorial design, the dependent variable is the interval or the ratio scale and 
there are two or more independent variables which are nominal scale. In the factorial 
design, it is possible to examine the interaction between the variables. If there are 
two independent variables each having three cells, there would be a total of nine 
interactions. The details on this are already explained in Unit 3 (Research Design). 
The main advantage of factorial design over randomized block design is that it is 
possible to measure the main effects as well as the interaction effects of two or 
more independent variables at various levels. Further, the randomized block design 
has only two independent variables whereas, factorial design can take care of 
more than two independent variables. Let us consider an illustration to explain 
factorial design. 


It is generally observed that there are differences in the pay packages offered 
to fresh MBA graduates. The variations could be either due to the type of business 
school where they have studied or it could be due to their area of specialization. 
The variation can also be due to an interaction between the business school and 
the area of specialization. For example, the specialization in finance at one business 
school might fetch a better package. All these presumptions could be tested with 
the help of the factorial design explained with the help of the following example. 


Example 12.4: The following data refers to the salary package (in lakhs) offered 
to MBA graduates with different specializations and having studied at four different 
business schools. For the sake of simplification, only two students are taken for 
each interaction between the institute and field of specialization. 


Business School 


Analysis of Variance 
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Analysis of Variance Test the hypothesis (i) whether the difference between the pay packages 
offered can be attributed to chance (i1) average pay packages by all specializations 


are equal. (iii) the average pay package for 12 interactions are equal. 
NOTES You may use a 5 per cent level of significance. 
Solution: The following set of hypotheses is required to be tested. 


Business schools: 


H,: Average pay package for all the institutions are equal. 

H, : Average pay package for all the institutions are not equal 
Specialization: 

H,: Average pay package for all the specializations are equal. 

H- Average pay package for all the specializations are not equal 
Interaction: 

H, : Average pay package for all 12 interactions are equal. 

H, : Average pay package for all 12 interactions are not equal 


Let us compute the following: 


(Sum of all observations)’ 


a Total number of observations 
— ey = —= = 1107.04 
Total sumofsquares = (Sum of squares of observations) — CF 
=6 +448? +64+---+7 4+ 57+ 9 + 10? — 1107.04 
= 1179 — 1107.04 
= 71.96 
Sum of squares due to specialization (row)/SSR 
44? 56? 63° 
a aa a 
= 1130.13 — 1107.04 
= 23.08 
Where, 
Sum total for Marketing = 44 
Sum total for Finance = 56 
Sum total for Operations = 63 
Sum of squares due to school (column)/SSC 
39? $ 32? M 46? R 46? _cF 


6 6 6 6 
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= 1129.5 — 1107.04 Analysis of Variance 
= 22.46 
Where, 
Sum total for Business School I = 39 NOTES 


Sum total for Business School II = 32 
Sum total for Business School II = 46 
Sum total for Business School IV = 46 


Sum of squares due to interactions (SSI) = n DA Xj -Xj -Xj + X J 


Where, 
n = number ofobservations for each interaction 


X;. = Mean of observations of i" row 

X.; = Mean of observation of j" column 

X.. = Grand mean ofall the observations. 

Xj = Mean of observation of i" row and j column 


The above terms can be calculated by first calculating the means of all the 
interactions and also the means of the corresponding rows and columns. These 
are presented in the table below: 


Business School 
Specialization l ll i IV 


Marketing 


Finance 


Operations 


SSI = 2) X (Xj - Xi. Kp +X. 
= 25.5 —5.5 —6.5+ 6.79) + (4.5 —5.5 —5.33 + 


é 7 9 ) i 
+ (9.5 — 7.88 — 7.67 + 6.79)’] 
= 2x 8.96 = 17.92 


Sum of Squares due to error (SSE): 
SSE = TSS -SSR - SSC - SSI 
= 71.96 — 23.08 — 22.46 — 17.92 
= 8.5 
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Therefore, the ANOVA table for factorial design could be prepared as 
given in Table 12.7. 


Table 12.7 Results of ANOVA Table for Factorial Design 


Sum of Degrees of | Mean Sum of 


Sources of Variation Squares Freedom Squares F 
Row (Specialization) 23.08 2 11.54 16.26 
Column (Business School) 22.46 3 7.49 10.55 
Interaction 17.92 6 2.96 4.17 

12 
Total 71.96 23 


The table values of FÊ, FÈ and FÈ (at 5 per cent level of significance) are 
given as 3.885, 3.490 and 2.996 respectively. As the computed value for the 
hypothesis concerning specialization, business school and interaction are greater 
than the corresponding tabulated values; the three null hypotheses are rejected. 
This means that it can be concluded that the packages offered to the graduates 
vary due to their specialization, the type of business school in which they have 
studied and their interactions. 


Check Your Progress 


5. State the total number of interactions in a factorial design with two 
independent variables, one having two categories and second having three 
categories. 

6. What is the main advantage of factorial design over randomized block 
design? 


12.5 ANSWERS TO CHECK YOUR PROGRESS 
QUESTIONS 


1. Inacompletely randomized design, the dependent variable is metric (interval/ 
ratio scale) whereas the independent variable is categorical (nominal scale). 

2. The fourth column of a one-way analysis of the variance table denotes the 
mean square. 

3. The total sum of square is partitioned into three components—one due to 
treatment, second due to block and third one due to chance. 

4. In a randomized block design, the effect of one extraneous variable is 
removed. 


5. Ina factorial design with two independent variables, one having two Analysis of Variance 
categories and second having three categories, the total number of interactions 
is six. 
6. The main advantage of factorial design over randomized block design is NOTES 
that it is possible to measure the main effects as well as the interaction 
effects of two or more independent variables at various levels. 


12.6 SUMMARY 


e RA Fisher developed the theory of analysis of variance. This technique 
could be used to test the equality of more than two population means in one 
go. The basic principle underlying the technique is that the total variations in 
the dependent variable can be broken into two components—one which 
can be attributed to specific causes and the other one may be attributed to 
chance. In analysis of variance, the dependent variable is metric, where as, 
the independent variable is categorical (nominal scale). 


e The analysis of variance techniques in this unit are illustrated through the 
completely randomized design, randomized block design and factorial 
design. 


e In a completely randomized design, there is one dependent and one 
independent variable. The dependent variable is metric whereas the 
independent variable is categorical. Random samples are drawn from each 
category of the independent variable. The sample size from each category 
could be same or different. 


e Inthe randomized block design, there is one independent variable and one 
extraneous factor (block). Both independent variable and extraneous factor 
(block) are nominal scale variables. The effect of the extraneous factor is 
removed from the analysis. 


e In factorial design, the dependent variable is metric and there are two or 
more independent variables which are non-metric. In this design, it is possible 
to examine the interaction between the variables. If there are two independent 
variables each having three cells, there would be a total of nine interactions. 


12.7 KEY WORDS 


e Analysis of variance: A technique used to compare means of two or 
more samples (using the F distribution). This technique can be used only for 
numerical data. 
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Analysis of Variance e Completely randomized design: A design that involves the testing of the 
equality of means of two or more groups; there is one dependent variable 
and one independent variable in this design. 


NOTES e Factorial design: A design for an experiment that allows the experimenter 
to find out the effect of two or more independent variables each having two 
or more categories along with their interactions on dependent variable. 


e One-way ANOVA: A technique that compares the mean of two or more 
groups based on one independent variable (or factor). 


e Two-way ANOVA: A statistical test used to determine the effect of two 
nominal predictor variables on a continuous outcome variable. A two-way 
ANOVA test analyzes the effect of the independent variables on the expected 
outcome along with their relationship to the outcome itself. 


12.8 SELF ASSESSMENT QUESTIONS AND 
EXERCISES 


Short-Answer Questions 


1. What are the characteristics of randomized block design? 
2. Explain the meaning of interaction between the variables with the help of a 
suitable example. 
Long-Answer Questions 
1. What is the analysis of variance? What are the assumptions of the technique? 
Give a few examples where the technique could be used. 


2. Differentiate using suitable examples between the one-way and two-way 
analysis of variance. 


3. What is a factorial design? Explain the terms, main effects and interaction 
effects in relation to factorial design. 


4. The following data represents the numbers of units produced by four 
operators during three different shifts: 


Operator 
[we [ae fe r 
10 8 12 13 
SS E 
Il 12 10 11 14 


Perform a two-way analysis of variance and interpret the result. 
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UNIT 13 RESEARCH REPORT 
WRITING 


Structure 


13.0 Introduction 
13.1 Objectives 
13.2 Types of Research Reports 
13.2.1 Brief Reports 
13.2.2 Detailed reports 
13.3 Report Writing: Structure of the Research Report 
13.3.1 Preliminary Section 
13.3.2 Main Report 
13.3.3 Interpretations of Results and Suggested Recommendations 
13.4 Report Writing: Formulation Rules for Writing the Report 
13.4.1 Guidelines for Presenting Tabular Data 
13.4.2 Guidelines for Visual Representations: Graphs 
13.5 Answers to Check Your Progress Questions 
13.6 Summary 
13.7 Key Words 
13.8 Self Assessment Questions and Exercises 
13.9 Further Readings 


13.0 INTRODUCTION 


In the previous units, we have discussed and learnt about data collection and 
processing. On completion of the research study and after obtaining the research 
results, the real skill of the researcher lies in analysing and interpreting the findings 
and linking them with the propositions formulated in the form of research hypotheses 
at the beginning of the study. The statistical or qualitative summary of results would 
be little more than numbers or conclusions unless one is able to present the 
documented version of the research endeavour. 


One cannot overemphasize the significance of a well-documented and 
structured research report. Just like all the other steps in the research process, this 
requires careful and sequential treatment. In this unit, we will be discussing in detail 
the documentation of the research study. The format and the steps might be 
moderately adjusted and altered based on the reader’s requirement. Thus, it might 
be for an academic and theoretical purpose or might need to be clearly spelt and 
linked with the business manager’s decision dilemma. 


13.1 OBJECTIVES 


After going through this unit, you will be able to: 
e Classify the various types of research reports 
e Explain the process of report writing and presentation in business research 


e Discuss the key features to be kept in mind in terms of the report format 


13.2 TYPES OF RESEARCH REPORTS 


The research report has a very important role to play in the entire research process. 
It is a concrete proof of the study that was undertaken. It is a one-way 
communication of the researcher’s study and analysis to the reader/manager, and 
thus needs to be all-inclusive and yet neutral in its reporting. The significant role 
that a research report can play is as follows: 


e The research report documents all the steps followed right from framing the 
research question to the interpretation of the study findings 

e Each step also includes details on how and why that step was conducted, 
i.e. the justification for choosing one technique over the other. 

e It also serves to authenticate the quality of the work carried out and 

establishes the strength of the findings obtained. 

The report gives a clear direction in terms of the implication of the results 

for the decision maker. This could be academic or applied depending on 

the orientation 


The report serves as a very important framework for anyone who would 
like to do research in the same area or topic. 


13.2.1 Brief Reports 


These kinds of reports are not formally structured and are generally short, sometimes 
not running more than four to five pages. The information provided has limited 
scope and is a prelude to the formal structured report that would subsequently 
follow. These reports could be designed in several ways. 


e Working papers or basic reports are written for the purpose of recording 
the process carried out in terms of scope and framework of the study, the 
methodology followed and instrument designed. The results and findings 
would also be recorded here. However, the interpretation of the findings 
and study background might be missing, as the focus is more on the present 
study rather than past literature. 


e Survey reports might or might not have an academic orientation. The focus 
here is to present findings in easy-to-comprehend format that includes figures 


Research Report Writing 


NOTES 


Self-Instructional 
Material 


239 


Research Report Writing 


240 


NOTES 


Self-Instructional 
Material 


and tables. The advantage of these reports is that they are simple and easy 
to understand and present the findings in a clear and usable format. 


13.2.2 Detailed reports 


These are more formal and could be academic, technical or business reports. 


e Technical reports: These are major documents and would include all 
elements of the basic report, as well as the interpretations and conclusions, 
as related to the obtained results. This would have a complete problem 
background and any additional past data/records that are essential for 
understanding and interpreting the study results. All sources of data, sampling 
plan, data collection instrument(s), data analysis outputs would be formally 
and sequentially documented. 


Business reports: These reports include conclusions as understood by 
the business manager. The tables, figures and numbers of the first report 
would now be pictorially shown as bar charts and graphs and the reporting 
tone would be more in business terms. Tabular data might be attached in 
the appendix. 


Check Your Progress 


1. Name the type of report whose focus is more on the present study rather 
than past literature. 


2. Do survey reports always have an academic orientation? 


13.3 REPORT WRITING: STRUCTURE OF THE 
RESEARCH REPORT 


Whatever the type of report, the reporting requires a structured format and by 
and large, the process is standardized. The major difference amongst the types of 
reports is that all the elements that make a research report would be present only 
in a detailed technical report in comparison to management report. Usage of 
theoretical and technical jargon would be higher in the technical report and visual 
presentation of data would be higher in the management report. 


The process of report formulation and presentation is presented in 
Figure 13.1. As can be observed, the preliminary section includes the title page, 
followed by the letter of authorization, acknowledgements, executive summary 
and the table of contents. Then come the background section, which includes the 
problem statement, introduction, study background, scope and objectives of the 
study and the review of literature (depends on the purpose). This is followed by 
the methodology section, which, as stated earlier, is again specific to the technical 
report. This is followed by the findings section and then come the conclusions. 
The technical report would have a detailed bibliography at the end. 


In the management report, the sequencing of the report might be reversed 
to suit the needs of the decision-maker, as here the reader needs to review and 
absorb the findings. Thus, the last section on interpretation of findings would be 
presented immediately after the study objectives and a short reporting on 


methodology could be presented in the appendix. 


As presented in Figure 13.1, most research reports include the following 


sections: 


Preliminary Section 

e Title Page 

e Letter of Authorization 
e Executive Summary 

e Acknowledgements 

e Table of Contents 


Background Section 

e Problem Statement 

e Study Introduction and Background 
e Scope and Objectives of the Study 
e Review of Literature 


Methodology Section 
e Research Design 

e Sampling Design 

e Data Collection 

e Data Analysis 


Findings Section 
e Results 
e Interpretation of Results 


Conclusions Section 
e Conclusion and Recommendations 
e Limitations of the Study 


Appendices 
Glossary of Terms 
Bibliography 


Fig. 13.1 The Process of Report Formulation and Writing 
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13.3.1 Preliminary Section 


This section mainly consists of identification information for the study conducted. 
It has the following individual elements: 
Title page: The title should be crisp and indicative of the nature of the project, as 
illustrated in the following examples. 

Comparative analysis of BPO workers and schoolteachers with 
reference to their work-life balance 

Segmentation analysis of luxury apartment buyers in the National 
Capital Region (NCR) 


Letter of transmittal: This is the letter that broadly refers to the purpose behind 
the study. The tone in this note can be slightly informal and indicative of the rapport 
between the client-reader and the researcher. A sample letter of transmittal is 
presented in Exhibit 13.1. The letter broadly refers to three issues. It indicates the 
term of the study or objectives; next it goes on to broadly give an indication of the 
process carried out to conduct the study and the implications of the findings. The 
conclusions generally are indicative of the researcher’s learning from the study. 


Exhibit 1 


Sample Letter of Transmittal 
To: Mr Prem Parashar From: Nayan Navre 
Company: Just Bondas Corporation (JBC) Company: Jigyasa Associates 


Location: Mumbai 116879 Location: Sabarmati Dham, 
Mumbai 


Telephone: 8786767; 4876768 Telephone: 41765888 
Fax: 48786799 Fax: 41765899 
Addendums: Highlight of findings (pages: 20) 

15 January 2012 

Dear Prem, 


Please find the enclosed document which covers a summary of the findings of the 
November- December 2011 study of the new product offering and its acceptibility. I 
would be sending three hard copies of the same tomorrow. 


Once the core group has discussed the direction of the expected results I would request 
you to kindly get back with your comments/queries/suggestions, so that they can be 
incorporated in the preparation of the final report document. 


The major findings of the study were that the response of the non-vegetarians 
consuming the new keema bonda pav at Just Bondas was positive. As you can observe, 
however, the introduction of the non-vegetarian bonda has not been well received by 
the regular customers who visit the outlets for their regular alloo bonda. These findings, 
though on a small respondent base, are significant as they could be an indication of a 
deflecting loyal customer base. 


Best regards, 


Nayan 


Letter of authorization: The author of this letter is the business manager who 
formally gives the permission for executing the project. The tone of this letter, 
unlike the above document, is very precise and formal. 


Table of contents: All reports should have a section that clearly indicates the 
division of the report based on the formal areas of the study as indicated in the 
research structure. The major divisions and subdivisions of the study, along with 
their starting page numbers, should be presented. Once the major sections of the 
report are listed, the list of tables come next, followed by the list of figures and 
graphs, exhibits (if any) and finally the list of appendices. 


Executive summary: The summary of the entire report, starting from the scope 
and objectives of the study to the methodology employed and the results obtained, 
has to be presented in a brief and concise manner. The executive summary 
essentially can be divided into four or five sections. It begins with the study 
background, scope and objectives of the study, followed by the execution, including 
the sample details and methodology of the study. Next comes the findings and 
results obtained. The fourth section covers the conclusions and finally, the last 
section includes recommendations and suggestions. 


Acknowledgements: A small note acknowledging the contribution of the 
respondents, the corporates and the experts who provided inputs for accomplishing 
the study is included here. 


13.3.2 Main Report 


This is the most significant and academically robust part of the report. 


Problem definition: This section begins with the formal definition of the research 
problem. 


Study background: Study background essentially begins by presenting the 
decision-makers’ problem and then moves on to a description of the theoretical 
and contemporary market data that laid the foundation that guided the research. 


Incase the study is an academic research, there is a separate section devoted 
to the review of related literature, which presents a detailed reporting of work 
done on the same or related topic of interest. 


Study scope and objectives: The logical arguments then conclude in the form of 
definite statements related to the purpose of the study. In case the study is causal 
in nature, the formulated hypotheses are presented here as well. 


Methodology of research: The section would essentially have five to six sections 
specifying the details of how the research was conducted. These would essentially 
be: 


e Research framework or design: The variables and concepts being 
investigated are clearly defined, with a clear reference to the relationship 
being studied. The justification for using a particular design also has to be 
presented here. 
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e Sampling design: The entire sampling plan in terms of the population being 
studied, along with the reasons for collecting the study-related information 
from the given group is given here. 


Data collection methods: In this section, the researcher should clearly list 
the information needed for the study as drawn from the study objectives 
stated earlier. The secondary data sources considered and the primary 
instrument designed for the specific study are discussed here. However, the 
final draft of the measuring instrument can be included in the appendix. 


Data analysis: The assumptions and constraints of the analysis need to be 
explained here in simple, non-technical terms. 


e Study results and findings: This is the most critical chapter of the report 
and requires special care; it is probably also one of the longest chapters in 
the document. 


13.3.3 Interpretations of Results and Suggested Recommendations 


This section comes after the main report and contains interpretations of results 
and suggested recommendations. It presents the information in a summarized and 
numerical form. 


Sometimes, the research results obtained may not be in the direction as 
found by earlier researchers. Here, the skill of the researcher in justifying the obtained 
direction is based on his/her individual opinion and expertise in the area of study. 
After the interpretation of results, sometimes, the study requirement might be to 
formulate indicative recommendations to the decision-makers as well. Thus, in 
case the report includes recommendations, they should be realistic, workable and 
topically related to the industry studied. 


Limitations of the study 


The last part in this section is a brief discussion of the problems encountered 
during the study and the constraints in terms of time, financial or human resources. 


End notes 
The final section of the report provides all the supportive material in the study. 
Some of the common details presented in this section are as follows: 


Appendices: The appendix section follows the main body of the report and 
essentially consists of two kinds of information: 


1. Secondary information like long articles or in case the study uses/is based 
on/refers to some technical information that needs to be understood by the 
reader; long tables or articles or legal or policy documents. 


2. Primary data that can be compressed and presented in the main body of 
the report. This includes original questionnaire, discussion guides, formula 


used for the study, sample details, original data, long tables and graphs 
which can be described in statement form in the text. 


Bibliography: This is an important part of the final section as it provides the 
complete details of the information sources and papers cited in a standardized 
format. It is recommended to follow the publication manuals from the American 
Psychological Association (APA) or the Harvard method of citation for preparing 
this section. The reporting content of the bibliography could also be in terms of: 


e Selected bibliography: Selective references are cited in terms of relevance 
and reader requirement. Thus, the books or journals that are technical and 
not really needed to understand the study outcomes are not reported. 


e Complete bibliography: All the items that have been referred to, even 
when not cited in the text, are given here. 


e Annotated bibliography: Along with the complete details of the cited work, 
some brief information about the nature of information sought from the article 
is given. 

At this juncture we would like to refer to citation in the form ofa footnote. 
To explain the difference we would first like to explain what a typical footnote is: 


Footnote: A typical footnote, as the name indicates, is part of the main report and 
comes at the bottom of a page or at the end of the main text. This could refer to a 
source that the author has referred to or it may be an explanation of a particular 
concept referred to in the text. 


The referencing protocol of a footnote and bibliography is different. In a 
footnote, one gives the first name of the person first and the surname next. However, 
this order is reversed in the bibliography. Here we start first with the surname and 
then the first name. In a bibliography, we generally mention the page numbers of 
the article or the total pages in the book. However, in a footnote, the specific page 
from which the information is cited is mentioned. A bibliography is generally arranged 
alphabetically depending on the author’s name, but in the footnote the reporting is 
based on the sequence in which they occur in the text. 


Glossary of terms: In case there are specific terms and technical jargon used in 
the report, the researcher should consider putting a glossary in the form of a word 
list of terms used in the study. This section is usually the last section of the report. 


Check Your Progress 


3. Name the section which includes the title page, followed by the letter of 
authorization, acknowledgements, executive summary and the table of 
contents. 


4. What does the annotated bibliography include? 
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13.4 REPORT WRITING: FORMULATION RULES 


FOR WRITING THE REPORT 


Listed below are some features of a good research study that should be kept in 
mind while documenting and preparing the report. 


e Clear report mandate: While writing the research problem statement and 
study background, the writer needs to be absolutely clear in terms of why 
and how the problem was formulated. 


e Clearly designed methodology: Any research study has its unique 
orientation and scope and thus has a specific and customized research design, 
sampling and data collection plan. In researches, that are not completely 
transparent on the set of procedures, one cannot be absolutely confident of 
the findings and resulting conclusions. 


e Clear representation of findings: Complete honesty and transparency 
in stating the treatment of data and editing of missing or contrary data is 
extremely critical. 


e Representativeness of study finding: A good research report is also 
explicit in terms of extent and scope of the results obtained, and in terms of 
the applicability of findings. 


Thus, some guidelines should be kept in mind while writing the report. 


e Command over the medium: A correct and effective language of 
communication is critical in putting ideas and objectives in the vernacular of 
the reader/decision-maker. 


e Phrasing protocol: There is a debate about whether or not one makes use 
of personal pronoun while reporting. The use of personal pronoun such as ‘I 
think.....’ or ‘in my opinion. ....’ lends a subjectivity and personalization of 
judgement. Thus, the tone of the reporting should be neutral. For example: 


‘Given the nature of the forecasted growth and the opinion of the 
respondents, it is likely that the...... i 


Whenever the writer is reproducing the verbatim information from another 
document or comment of an expert or published source, it must be in inverted 
commas or italics and the author or source should be duly acknowledged. For 
example: 


Sarah Churchman, Head of Diversity, PricewaterhouseCoopers, states ‘At 
PricewaterhouseCoopers we firmly believe that promoting work-life balance is a 


‘business-critical’ issue and not simply the ‘right thing to do’.The writer should 
avoid long sentences and break up the information in clear chunks, so that the 
reader can process it with ease. 


Simplicity of approach: Along with grammatically and structurally correct 
language, care must be taken to avoid technical jargon as far as possible. In case 
it is important to use certain terminology, then, definition of these terms can be 
provided in the glossary of terms at the end of the report. 


Report formatting and presentation: In terms of paper quality, page margins 
and font style and size, a professional standard should be maintained. The font 
style must be uniform throughout the report. The topics, subtopics, headings and 
subheadings must be construed in the same manner throughout the report. The 
researcher can provide data relief and variation by adequately supplementing the 
text with graphs and figures. 


13.4.1 Guidelines for Presenting Tabular Data 


Most research studies involve some form of numerical data, and even though one 
can discuss this in text, itis best represented in tabular form. The data can be given 
in simple summary tables, which only contain limited information and yet, are, 
essentially critical to the report text. 


The mechanics of creating a summary table are very simple and are illustrated 
below with an example in Table 13.1. The illustration has been labelled with numbers 
which relate to the relevant section. 


Table identification details: The table must have a title (1a) and an identification 
number (1b). The table title should be short and usually would not include any 
verbs or articles. It only refers to the population or parameter being studied. The 
title should be briefly yet clearly descriptive of the information provided. The 
numbering of tables is usually in a series and generally one makes use of Hindu 
Arabic numbers to identify them. 


Data arrays: The arrangement of data in a table is usually done in an ascending 
manner. This could either be in terms of time, as shown in Table 13.1 (column- 
wise) or according to sectors or categories (row-wise) or locations, e.g., north, 
south, east, west and central. Sometimes, when the data is voluminous, it is 
recommended that one goes alphabetically, e.g., country or state data. Sometimes 
there may be subcategories to the main categories, for example, under the total 
sales data—a columnwise component of the revenue statement—there could be 
subcategories of department store, chemists and druggists, mass merchandisers 
and others. Then these have to be displayed under the sales data head, after giving 
a tab command as follows: 
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Table 13.1 Automobile Domestic Sales Trends 


G © 
(tb) 
(4a) > Y a> R i 
b B (4a) Yearwise data (number of cars) 
(2b), >| Category T 2002-2093 2003-2004 2004-2005 2006-2007 2007-2008 
Passenger vehicles.. Y... 707,198 ~ 902,09® 1,061,572 1,143,076 1,379,979 
Commercial Vehicles...... 190,682 260,114 318,430 351,041 467,765 
Three-wheelers...... 231,529 284,078 307,862 359,920 403,910 
(7a) —>| Two-wheelers...... 4,812,126 5,364,249 6,209,765 7,052,391 7,872,334 
Grand Total* 5,941,535 6,810,537 7,897,629 8,906,428 10,123,988 
(6b)——> *Does not include second hand car sales. 
¢)— Source: SIAM 


Total sales 

Mass market 

Department store 

Drug stores 

Others (including paan beedi outlets) 


Measurement unit: The unit in which the parameter or information is presented 
should be clearly mentioned. 


Spaces, leaders and rulings (SLR): For limited data, the table need not be 
divided using grid lines or rulings, simple white spaces add to the clarity of information 
presented and processed. In case the number of parameters are too many, it is 
advisable to use vertical ruling. Horizontal lines are drawn to separate the headings 
from the main data, as can be seen in Table 13.1. When there are a number of 
subheadings as in the sales data example, one may consider using leaders (....... ) 
to assist the eye in reading the data. 


Total sales 

Mass market......... 

Department store......... 

Drug stores......... 

Others (including paan beedi outlets)......... 


Assumptions, details and comments: Any clarification or assumption made, or 
a special definition required to understand the data, or formula used to arrive at a 
particular figure, e.g., total market sale or total market size can be given after the 
main tabled data in the form of footnotes. 


Data sources: In case the information documented and tabled is secondary in 
nature, complete reference of the source must be cited after the footnote, if any. 


Special mention: In case some figure or information is significant and the reader Research Report Writing 
should pay special attention to it, the number or figure can be bold or can be 
highlighted to increase focus. 


13.4.2 Guidelines for Visual Representations: Graphs NOTES 


Similar to the summarized and succinct data in the form of tables, the data can also 
be presented through visual representations in the form of graphs. 


Line and curve graphs: Usually, when the objective is to demonstrate trends 
and some sort of pattern in the data, a line chart is the best option available to the 
researcher. It is also possible to show patterns of growth of different sectors or 
industries in the same time period or to compare the change in the studied variable 
across different organizations or brands in the same industry. Certain points to be 
kept in mind while formulating line charts include: 


e The time units or the causal variable being studied are to be put on the X- 
axis, or the horizontal axis. 


e Ifthe intention is to compare different series on the same chart, the lines 
should be of different colours or forms (Figure 13.2). 


Handle 


Convenience Gas Warranty/ Dealer Easey of Cargo 
Features Comfort Mileage Safety Style Service Plan Service Maintenance Capacity noes 
High 5 
Med + 
~A 
@— | N 
D li 
Low 
Std. Economy Car A Tata Nano BUV @ 
sari 
> 
MSRP $10,500 MSRP $2,500 MSRP $3,000 


Fig. 13.2 Comparative Analysis of Vehicles (including Nano) 
on Features Desired by Consumers 


e Too many lines are not advisable; an ideal number would be five or less 
than five lines on the chart. 


e The researcher also must take care to formulate the zero baseline in the 
chart as otherwise, the data would seem to be misleading. For example, in 
Figure 13.3a, in case the zero baseline is (as shown in the chart) the expected 
change in the number of hearing aids units to be sold over the time period 
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Sales (Units) 


Sales (Unit) 


2002-03 to 2007-08, it can be accurately understood. However, in Figure 
13.3b, where the zero is at 1,50,000 units, the rate of growth can be 


misjudged to be more swift. 
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250,000 
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Pessimistic == Realistic 


= Optimistic 


Fig. 13.3(a) Expected Growth in the Number of Hearing Aids 
Units to be Sold in North India (three perspectives) 


2002-2003 2003-2004 2004-2005 2005-2006 2006-2007 2007-2008 


500,000 


450,000 


400,000 


350,000 


300,000 y 


250,000 A 


200,000 


150,000 —== 


2004-05 2005-06 2006-07 


Year 


2002-03 2003-04 


== Pessimistic 


Optimistic = Realistic 


Fig. 13.3(b) Expected Growth in the Number of Hearing Aids Units 
to be Sold in North India (three perspectives) 


| 
2007-08 


Area or stratum charts: Area charts are like the line charts, usually used to 
demonstrate changes in a pattern over a period of time. What is done is that the 
change in each of the components is individually shown on the same chart and 


each of them is stacked one on top of the other. The areas between the various 
lines indicate the scale or volume of the relevant factors/categories (Figure 13.4). 


Cluster Number 
30-5 of Case 
Innovator 


GS Patriotic buyer 


257) Gm Dogmatic buyer 
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Count 


104 


t p N. p ep pa 
26.00 27.00 28.00 29.00 30.00 31.00 32.00 33.00 34.00 35.00 36.00 37.00 38.00 39.00 40.00 41.00 42.00 43.00 44.00 45.00 
Nano perception 


Fig. 13.4 Perception of Nano by Three Psychographic 
Segments of Two-wheeler Owners 


Pie charts: Another way of demonstrating the area or stratum or sectional 
representation is through the pie charts. The critical difference between a line and 
pie chart is that the pie chart cannot show changes over time. It simply shows the 
cross-section ofa single time period. There are certain rules that the researcher 
should keep in mind while creating pie charts. 


e The complete data must be shown as a 100 per cent area of the subject 
being graphed. 


e Jtisa good idea to have the percentages displayed within or above the pie 
rather than in the legend as then it is easier to understand the magnitude of 
the section in comparison to the total. For example, 


Figure 13.5 shows the brand-wise sales in units for the sample of existing brands 
of hearing aids in the North Indian market. 


Brandwise Sales (Units) 
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Fig. 13.5 Brandwise Sales (units) of Hearing Aids 
in the North Indian Market (2002-03) 
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Research Report Writing e Showing changes over time is difficult through a pie chart, as stated earlier. 
However, the change in the components at different time periods could be 
demonstrated as in Figure 13.6, showing a sample share of the car market 


in India in 2009 and the expected market composition of 2015. 
NOTES 


2015 


Maruti Suzuki Tata E Hyundai 
E Toyota i GM E Others 


Fig. 13.6 Sample Structure of the Indian Car Market (2009) 
and the Forecasted Structure for 2015 


Bar charts and histograms: A very useful representation of quantum or magnitude 
of different objects on the same parameter are bar diagrams. The comparative 
position of objects becomes very clear. The usual practice is to formulate vertical 
bars; however, it is possible to use horizontal bars as well ifnone of the variable is 
time related [Figure 13.7(a)]. Horizontal bars are especially useful when one is 
showing both positive and negative patterns on the same graph [Figure 13.7(b)]. 
These are called bilateral bar charts and are especially useful to highlight the objects 
or sectors showing a varied pattern on the studied parameter. 


Just Bondas 


Cafe mumbai 


Mumbai Masala 


Mc Donald's 


T T 
0 5 10 15 20 


Unit sales in thousands 
E Unit sales in thousand 


Fig. 13.7(a) Bar Chart per Day, Unit Sales (thousands) 
at Fast Food Outlets in Mumbai 
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Fig. 13.7(b) Bilateral Bar Chart—the Brand Recall and 
Brand Purchase Response for Pizza Joints in the NCR 


Another variation of the bar chart is the histogram (Figure 13.8) here 
the bars are vertical and the height of each bar reflects the relative or cumulative 
frequency of that particular variable. 


14 


12] 


2 Std. Dev = 6.24 
Mean = 61.2 
7 N = 37.00 


46.0 48.0 50.0 52.0 54.0 56.0 58.0 60.0 62.0 64.0 66.0 68.0 70.0 72.0 
Marks 


Fig. 13.8 Histogram (with normal curve) Displaying Marks 
in a Course on Research Methods for Management 
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Check Your Progress 
5. What is the best option available to the researcher when the objective is to 
demonstrate trends and some sort of pattern in the data? 
6. State the ideal number of lines in a chart. 
7. Mention the things a pie chart cannot show. 


13.5 ANSWERS TO CHECK YOUR PROGRESS 


QUESTIONS 


1. Working papers or basic reports are the type of report whose focus is 


more on the present study rather than past literature. 


2. No, the survey reports might or might not have an academic orientation. 


3. The preliminary section is the section which includes the title page, followed 
by the letter of authorization, acknowledgements, executive summary and 


the table of contents. 


from the article. 


5. Line chart is the best option available to the researcher when the objective 


is to demonstrate trends and some sort of pattern in the data. 


6. The ideal number of lines in a chart is five or less. 


7. The pie chart cannot show changes over time, as it simply shows the cross- 


section of a single time period. 


13.6 SUMMARY 


e The most important task ahead of the researcher is to document the entire 


work done in the form ofa well structured research report. 


e The orientation and structure of the report will depend on what kind of 
report is being constructed. These could be brief or detailed; academic, 


technical or business report. 


e The, reports generally follow a standardized structure. The entire report 
can be divided into three main sections—the preliminary section, the main 


body and endnotes. 


. The annotated bibliography contains complete details of the cited work, 
along with some brief information about the nature of information sought 


e There must be no ambiguity in either presenting the findings or 
representativeness of the findings. 


e Visual relief for the written can be provided through figures, tables and graphs. 


13.7 KEY WORDS 


Annotated bibliography: A bibliography that includes brief explanations 
or notes for each reference. 
e Bibliography: A list of the works of a specific author or publisher. 


e Executive summary: The summary of the entire report, starting from the 
scope and objectives of the study to the methodology employed and the 
results obtained, presented in a brief and concise manner. 


Letter of transmittal: The letter that broadly refers to the purpose behind 
the study. 


e Working paper: Report that is written for the purpose of recording the 
process carried out in terms of scope and framework of the study, the 
methodology followed and instrument designed. 


13.8 SELF ASSESSMENT QUESTIONS AND 
EXERCISES 


Short-Answer Questions 
1. What should be the ideal structure of a research report? What are the 
elements of the structure defined by you? 


2. What are the guidelines a researcher must follow for tabular representation 
of the research results? 


3. Differentiate between the referencing protocol ofa footnote and bibliography. 
4. How is a technical report different from a management report? 
Long-Answer Questions 
1. What are the different kind of reports available to the researcher? Do the 
criteria become different for different kinds of reports? Explain with examples. 


2. What are the guidelines for effective report writing? Illustrate with suitable 
examples. 


3. Discuss the guidelines for graphical representations of data in reports. What 
are the audio-visual aids available for the purpose? 
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14.0 INTRODUCTION 


In the earlier units, we have understood the process of research as it exists in the 
business world. However, one needs to be clear that like every other aspect of the 
working environment, the research process also has to be guided and monitored 
by acode of ethics. This becomes important when we see that research requires 
us to collect information and may be at times conduct experimentation also to test 
the study hypotheses. Thus it becomes important for the researcher to be absolutely 
ethical and transparent in conducting the study. He also needs to ensure that no 
physical or mental harm is caused to the study respondents. And lastly, in case he 
is conducting the research for a business manager he must maintain the confidence 
of the client and not reveal the study findings and approach in case the client does 
not want this to be public. 


Thus, in this unit we will learn about how we must conduct ourselves when 
we carry out a research study. This involves a code of ethics as related to the client 
as well as the researcher and the respondents. 


14.1 OBJECTIVES 


After going through this unit, you will be able to: 
e Explain the role of ethics in business research 


e Describe the ethical standards that the client must follow 
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e Discuss the ethical responsibilities of the researcher 
e Explain how to protect the rights of the respondent 


e Describe the ethical principles to be followed in the overall research process 


14.2 MEANING OF RESEARCH ETHICS 


Ethical standards are extremely important no matter what be the field of study. 
This takes a special meaning in the conduction of research. Rowley (2004) has 
put it very simply as ‘conducting research ethically is concerned with respecting 
privacy and confidentiality, and being transparent in the use of research data. Ethical 
practices hinge on respect and trust and approaches that seek to build rather than 
demolish relationships.’ Russeft et al. (1999) advocated that while conducting 
business research, the approach must be professional and responsible, the data 
collection must be attempted with the respondent's consent under appropriate 
and ethically correct methods; and, last but not the least, the interpretation has to 
be done in a careful unbiased manner. 


A number of corporations have developed their own code of ethics regarding 
the conduct of research. While this practice of defining business ethics, which 
includes research ethics, is common in most organizations in the West, in India this 
is spelt out and documented in the pharmaceutical sector and some banks like 
HSBC. Besides this, there are also well established and detailed ethical guidelines 
available from international bodies, for example, the Social Research Association’s 
(SRA's) ethical guidelines, the American Psychological Association (APA) code 
of ethics, code of standards and ethics for survey research designed by the Council 
of American Survey Research Organizations (CASRO), American Marketing 
Association (AMA) and Business Marketing Association (BMA) code of conduct 
and ethics. 


To understand the code of ethics involved in research, one needs to 
understand the three significant stakeholders involved in any research, namely: 


e The sponsoring clients or decision-makers. 
e The respondents from whom one seeks the information. 
e Theresearcher himself/herself while administering and compiling the study. 


Each one of these entities has their own specific interests and needs and, 
thus, the ethical concerns regarding each one would be unique. Thus, the following 
sections present brief guidelines on the ethical issues and their management. 


14.2.1 Client’s Ethical Code 


Similar to any other business transaction, research is also an exchange process 
between various people. The first of these is the one between the sponsoring 
client and the investigator. Thus both parties have an ethical obligation towards the 
other. In case the study is being conducted for a business client, complete 


transparency in terms of data gathering and interpreting is a must. It has been 
observed that the client might be a business manager who because of his own 
personal interests might steer the results in a specific direction in order to fulfil a 
hidden agenda. For example, in case a warehousing organization is looking at 
business expansion and hires a research agency to conduct a research study in 
order to provide directions.It might so happen that the business manager from the 
client side, who is dealing with the research agency owns a transport fleet and thus 
wants the researcher to recommend courier and transit warehousing services as 
business opportunities that the company can go into. 


It has been commonly found amongst small and relatively younger firms to 
ask for proposals from research agencies for the conduct of a study. However, 
once they obtain the details of the intended methodology, they usually get the 
study conducted by their own team or by trainees at a low to minimal cost to the 
company. And since the proposals are the first stage ofa research bid, the company 
is under no obligation to pay for the research methodology collected by them in an 
underhand manner. 


Another instance could be that even though the initial exploratory research 
and literature review indicate the nature of the respondent population, the client 
might, based on his own notions, force the researcher to undertake the study on a 
specific population. For example, if a new technology is being introduced in the 
company and the use requires computer literacy, the client might ask the researcher 
to measure the acceptability of the product amongst only the computer-savvy 
population. Thus the results would automatically be bent towards acceptance. 


Sometimes, the interpretation and recommendations might be beyond the 
scope of a study. For example, in the organic food study, which was conducted 
amongst retailers and consumers, the client might ask the researcher to suggest 
strategies for educating and building usage and recommendations amongst dieticians 
and doctors. 


It is recommended in this instance that the researcher must conduct a 
comprehensive exploratory research and develop clearly stated objectives that 
do not leave any scope for unethical intervention. Secondly, he must tell business 
manager that unless the results are unbiased the study will not contribute to informed 
decision making. In case of an unethical manager or client, it is best to avoid 
making recommendations and formulating strategies and leave the use or non-use 
of the data to the manager. And if nothing works it is best to terminate the research 
study, as unethical reporting and compilation is bound to spoil the researcher’s 
reputations. 


14.2.2 Researcher’s Ethical Code 


Since the researcher is the most involved and main person responsible for the 
study, it is very important that the highest ethical standards be maintained by him. 
Some specific checks he can look at are as follows: 
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Quality control 


A very important consideration, both short-term and long-term, is to maintain the 
standards of quality in the conduct of the study. The researcher must be absolutely 
objective and correct in choosing the research design that would be right for the 
study. For example, for studying the impact of a mathematics study programme 
on an experimental group of children, the researcher must have a matched control 
group of children with a similar understanding of mathematics so that the comparison 
is correct. 


Sometimes, the client might be unaware of the analytical rules and conditions 
for the result to be valid, thus it is the responsibility of the researcher to be absolutely 
transparent about the significance of the results obtained and refrain from 
emphasizing findings that might be of very little strength or value. 


Privacy control 


The most significant and important ethical concern of a research study is the issue 
of trust and confidentiality. At no cost must the researcher reveal any aspect of the 
study without the consent of the client. This could be in terms of not revealing the 
name of the company. For example, if the client is interested in finding out the 
comparative standing of their product with the competitor’s product, it becomes 
critical to conduct the study amongst users of the product category rather than 
only the company brand in order to get an unbiased evaluation. 


The researcher might also need to guard the reason or purpose of the study. 
For example if the client wants to measure a new product potential, then revealing 
the reason for the study might lead to the concept or idea being adopted and 
converted into a product prototype by someone else before the client is out with 
the offering. The third level of confidentiality that the researcher must ensure is the 
complete confidentiality of the findings till the research outcome has been converted 
into a business decision. For example, based on the organizational health index of 
its workers and the attrition rate, the correlation between the two variables might 
be alarming enough to require a major restructuring of the existing employee benefits 
and work policy. Or the research study might involve a comprehensive and detailed 
study of potential candidates being considered for the role of the CEO, as the 
existing leader is due for retirement. Thus, revelation of the findings of such research 
might lead to turbulence and divided opinion in the organization. Thus the results 
should not be made available to all till they have been brought into action. 


Check Your Progress 
1. List some of the international bodies which have provided well established 
and detailed ethical guidelines. 
2. State the most significant and important ethical concern ofa research study. 


3. Mention the controls which are important checks in a study. 


14.3 ETHICAL CODES RELATED TO 
RESPONDENTS 


The most important and vulnerable person in the research study is the respondent 
from whom the data is to be collected. Every association and organization that is 
directly or indirectly involved with research has made clear and detailed guidelines 
for ensuring that unethical treatment of the respondent does not happen. 


The American Association for Public Opinion Research has formulated the 
following code of ethics for survey researchers, with reference to the respondent: 


e We shall strive to avoid the use of practices or methods that may harm, 
humiliate or seriously mislead survey respondents. 


e Unless the respondent waives confidentiality for specific uses, we shall hold 
as privileged and confidential all information that might identify a respondent 
with his or her responses. We shall also not disclose or use the names of 
respondents for non-research purposes unless the respondent grants us 
permission to do so. 


Study disclosure 


The researcher needs to have complete and transparent information regarding the 
purpose of collecting data and what sort of information would be required from 
the respondent. The person must know what kind of questioning would be done, 
so that he is able to understand what the researcher is looking for and whether he 
has the information, whether he wants to share all or part of it and also how much 
time and effort would be needed. For example, for a new concept test or a 
segmentation analysis or an organizational climate survey the administration would 
require considerable time and commitment from the respondent. Secondly, if it is 
a before-and-after product acceptability or usage study, again the person would 
be contacted twice to assess the experience.Thus the researcher needs to be 
absolutely truthful about the nature and objectives of the study. 


Coercion and influence 


The researcher should not at any stage, either before or during the data collection 
stage, try to pressurize the respondent through persuasive influence or by forcing 
him to share information. For example, if the respondent has been through some 
traumatic experience, he/she might not want to share all details with a stranger, 
even if it is for an objective study. Schinke and Gilchrist (1993) state that under 
standards set by the National Commission for the protection of human subjects, 
all informed-consent procedures must meet three criteria: 


e Participants must be competent to give consent 
e Sufficient information must be provided to allow for a reasonable decision 


e Consent must be voluntary and uncoerced 
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Sometimes, it may so happen that the respondent is too young or too old or 
not literate and thus, unable to understand when the researcher might be either 
leading him/her to give certain preset answers or trying to force the person to 
share information that he does not want to reveal or which once shared might be 
misinterpreted. 


Sensitivity and respect 


There are certain issues like shoplifting or sexual orientation, which are not topics 
that can be managed in a structured, impersonal manner. The researcher should 
devote more time here and also keep the questions more open-ended, and usually 
such situations need a considerable rapport formation and a non-threatening 
atmosphere. The researcher, at all times, would need to treat the respondent with 
due respect and be transparent about the nature and objective of the questioning. 


Experimentation and implication 


In case the respondent is going to be part of the experimental group subjected to 
any sort of treatment, for example, a new shampoo trial or an intervention 
programme that may involve some behavioural change, complete information must 
be given regarding the course of the experiment and any risk, even minimal, which 
might be involved. The researcher, thus, must ensure minimal risk to the respondent 
and should in no way cause any harm to the person, even if it is for the quest of 
knowledge. Bailey (1978) describes this ‘harm’ as not only hazardous or medical 
experiments but also any social research that might involve such things as discomfort, 
anxiety, harassment, invasion of privacy or demeaning or dehumanizing procedures. 


Agreement or consent 


Once the researcher has clearly communicated the purpose, the nature and likely 
outcome of the study, it is advisable to make a mutual written or unwritten contract. 
This ensures that there is no unpleasantness or legal confrontation on either side. 
Another advantage of this is that in case a point was not very clear the issue gets 
clarified. For example, for a personal care usage study, the consumer might be 
under the impression that a questionnaire on usage would be filled in when actually 
the researcher wants to observe/record the usage ritual. This might call for some 
invasion of privacy of the respondent by the researcher, and thus taking the consent 
beforehand would make things clear for both the parties. 


Sometimes, the nature of the study might require that the name of the company 
be disguised. For example, one cannot start a study by saying, ‘We are conducting 
a survey for Mother Dairy milk; which do you think is the best milk in the city?’ 
Thus, here the debriefing about the company sponsoring the research can be 
revealed after the data has been collected, and the purpose of the disguise can be 
revealed. This ensures respondents’ goodwill and cooperation. 


14.4 RESPONSIBILITY OF ETHICS IN RESEARCH 


Besides ensuring that specific protocols and codes be followed for the two 
benefactors (client) and contributors (respondents), there are some basic tenets 
that the researcher must not forego. These are significant not only for the body of 
knowledge that the researcher is contributing to but also for the society in which 
we exist. 


Professional creed 


We have already discussed this in detail in both the sections above. However, 
here for professional creed, we refer to the overall conduct of the researcher, who 
has to be truthful during all phases of the study, whether in the conceptualization, 
conduction or presentation of the research study. 


e Atno stage should the researcher exaggerate or underplay the expense or 
effort incurred in the conduct of the study. Thus, sometimes the investigator 
might overclaim the expense incurred in travel or field visit. On the other 
hand, he might underpay the field investigators that he has kept for data 
collection by hiring undergraduate students rather than professional 
investigators. 


e The respondent group being studied should be a true representative of the 
identified respondent population studied and not a skewed and biased 
sample. Another unethical practice observed is that the researcher might 
conduct the study with a professional group of respondents who are well 
versed in the response technique and thus give ‘good’ or predictable answers. 


e The data and the questionnaire completed should be on authentic, real-time 
conduction, with actual respondents representative of the population under 
study and not fake completion done by the field investigators themselves. 


e The findings and results should be presented as they were found based on 
actual conduction and under no circumstances must the researcher attempt 
to fudge or manipulate the results of the study. 


Professional confidentiality 


The researcher must bear the responsibility to maintain the confidentiality of 
the research findings and not making public any aspects of the study, in an 
apparent or camouflaged manner. This code of ethics applies both to the 
sponsoring client, as well as the respondent. The anonymity and privacy of the 
respondent is to be respected and not violated. Also, recording private or 
personal behaviour with hidden devices is considered a monumental violation 
of an individual’s right to privacy (e.g., observing people in a fitting room with 
a hidden camera). 
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The right to privacy and confidentiality takes on a new meaning in 
cyberspace, where the respondent’s personal and demographic details are made 
available to the researching company and this could be compiled and collated 
and sold as databases to various service providers as authentic locational details 
for tapping potential customers. Thus, maintaining anonymity and confidentiality 
of information shared is a professional norm that any ethical researcher should 
follow. In case the data is to be shared, it must be done with the consent of the 
respondent. 


Professional objectivity 


As a true researcher and contributor to the existing body of knowledge, the 
researcher must maintain the objectivity of an absolutely neutral reporter of facts. 
He must maintain objectivity in all phases of the study while: 


e Designing the research objectives which must be based on facts and sound 
analysis rather than simple opinion. 


Collecting information by using a standard and not differential set of 
instructions. For example, in the intervention study quoted earlier, the 
researcher must give the instructions in the same way to both the experimental 
and control group and in no way try to exaggerate the actual impact of the 
treatment. 


Interpreting and presenting the findings as they are and not in a particular 
direction based on the researcher’s own gut feel or liking. For example, 
a researcher who is a consumer of organic food will attempt to 
exaggerate the health benefits of the products not because that is what 
was found but because as a consumer of the category, that is what he 
believes. 


Thus, as stated earlier, just like any other business function a code of ethics 
for conducting research is well structured and laid out by almost every business 
association. At all times, the researcher must remember that besides aiding in 
business decision-making, research also contributes to the huge domain of 
management knowledge. Thus, an authentic, transparent and objective reporting 
and compilation of the research becomes that much more critical. 


Check Your Progress 


4. Whois the most important and vulnerable person in the research study? 
5. What is the benefit ofa mutual written or unwritten contract? 


6. What is professional creed in research? 


14.5 USES OF LIBRARY AND INTERNET IN 
RESEARCH 


In this section, you will study the aspects of using library and internet in research. 
14.5.1 Uses of Library in Research 


It is common for a researcher to be confused and disoriented while using the 
library for research. Usually, this happens because the researchers feel at a 
loss as to where from and how to start searching the library resources. 
Therefore, a systematic and methodical approach towards the vast source of 
information that libraries usually offer is very essential. This can facilitate 
researchers in using quality time for conducting his/her search and collecting 
the essential information. Researchers should, therefore, create a concrete 
library research plan. Such a plan can enable him/her make an effective use of 
library materials for research. 


Library Research Plan 


A library research plan is a predefined activity that gives direction to your research. 
It is an act that involves evaluation that helps determine the subsequent activities to 
be followed by the researcher. As such, the research plan is a sequence of steps 
that the researcher should follow in order to get a comprehensible and reliable 
outline to adhere to. The various steps contained in a research plan can be stated 
as follows: 


e Subject evaluation: This involves analysing the research subject with an 
informative perspective. The researcher should find out the extent of 
information that is already available and known to him/her. This can give a 
clear idea about the unknown information that needs to be searched in the 
library. 

e Determine the scope of research: This involves identifying whether the 
research is a general study of occurrences or is concerned with more specific 
investigation. For example, whether your research is concerned with studying 
the eating habits of working women or eating habits of working people. 
Accordingly, you have to search for the relevant content. You should also 
check the chronological, geographical, political and other such aspects of 
your research study. You need to also analyse if your research deals with 
any specific locality, a particular time span or any current issue. 


e Sort-out keywords: The research subject should be disintegrated into a 
set of key terms or key words. A key word can be defined as a term that 
expresses the most basic words of the research content, which describes 
your broad topic. The researcher should separate the distinct and unique 
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key words. This will greatly help the researcher keep in track the related 
topics of the research subject and avoid deviation. It is thus the most 
significant activity of a researcher wherein he/she should determine the basic 

NOTES terms to be adhered to while searching for books and other similar information 
resources. 


Select the right library tool: Depending upon the scope of your research, 
you can resort to the appropriate library tools required for collecting the 
information for research. There is a wide range of tools offered by a library 
in modern times. These tools span from small almanacs and handbooks to 
the most comprehensive books and anthological volumes, and most obviously 
the computerized library catalogue, which is the result of information 
technology revolution. 


Library Research Tools 


Once the researcher has executed the activities involved in the research plan, he/ 
she should start looking for the essential and relevant information. This involves 
exploring information through traditional and modern library research tools, which 
contain specific bits of information as well as voluminous records and theories. To 
be able to make full use of these tools, it is necessary for the researcher to become 
familiar with the applications of these specific library tools. The most common 
library research tools available in any library are as follows: 


e Library catalogue: A library catalogue is an informative list of resources 
and materials available in the library. It comprises the name of books or 
journals along with the name of authors, subjects and publishing houses. 
It thus, informs the researcher what is available in the library. Usually, in 
the developed countries, such catalogues are stored as computerized 
databases, which use featured searches with headings like ‘author,’ ‘title,’ 
‘subject’ and ‘keywords.’ However, in India, many government-funded 
educational libraries are still using the paper-oriented catalogue technique 
except for such private libraries like the British Council Library, American 
Library, Indian Institute of Technology Library, etc. It is, therefore, 
advisable for a researcher to refine his/her research subject to specific 
key words and search for the necessary information using these key 
words. 


Almanacs: An almanac is a chronological tabular publication that is published 
annually. Traditional almanacs, usually, contained information regarding 
weather forecasts, astronomical data and several other statistics like the 
rising and setting of sun, moon, eclipses etc. However, in the current times, 
almanacs have become all comprehensive and include statistical and 
explanatory information regarding happenings in the whole world. Topical 


Self-Instructional 
266 Material 


weather developments, historical events, factual information, etc., are the 
features of the present almanacs. A researcher can use these for quick grasping 
of facts of his research topic. 


Dictionaries: Dictionaries are most often regarded as a superficial source 
of information as it is supposed to be performing the sole function of defining 
the meanings of words. However, contemporary editions of dictionaries 
are much innovative in their own way and explain innumerable terms in 
context of several usages of the specific term. As such, a researcher should 
seek factual information in the dictionary and also search for related phrases 
mentioned in the context of the word being searched to get a broader view 
of his/her topic of interest. 


Encyclopaedia: Encyclopaedia, in general, is a bulk informative volume 
containing a synopsis of the concerned subject, which is published in 
alphabetical order. Encyclopaedia is, in fact, an extension of the concept of 
dictionary wherein the words are described precisely. A background context, 
however, can make the reader get more acquainted with the research term 
and, therefore, it is always preferable to consult an encyclopaedia to get a 
better knowledge of the subject matter. This also helps the researcher to 
understand several jargons and terminologies related to his/her subject of 
research. 


Bibliographies: A bibliography usually comprises a list of reference materials 
mentioned by the researcher. This list gives the names of various sources 
that the researcher has resorted to, such as books and articles for 
research. The bibliographies are mentioned at the end of the article or 
research paper. It gives information regarding any particular topic that has 
been published together as a book. There are two types of bibliographies 
depending upon the information they provide. These two types can be stated 
and explained as follows: 


o Enumerative bibliography: This is also known as compilative, reference 
or systematic bibliography. It gives a general idea of the relevant 
publications in a specific subject matter. The most common format to be 
used by the researcher while giving citations in such bibliography is as 
follows: 

e Author 

e Title 

e Publishing company 
e Publication date 


o Analytical bibliography: This type is further classified into descriptive, 
historical and textual bibliographies. Usually, these are concerned with 
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Ethics in Research the physical attributes and contemporary importance ofa book. They 
take into account the size, format and context in which the book was 
printed and published, etc. Therefore, such bibliographies are not very 


closely connected to any form of research. 


NOTES Bibliographies, however, provide a good research tool to refer to for an 


effective exploring through the vast data available in a library. 


Indexes: Indexes, as is well known, are the alphabetical lists of authors’ 
name and subjects containing the relevant page numbers where these topics 
and authors have been discussed or described in the book. In research, 
they facilitate searching through a number of journals simultaneously and 
thus provide considerable information at the start of the research process. 
Indexes cover a large variety of sources, ranging from books, periodicals, 
conference papers, reports, thesis and articles, etc. 


Search engines: Search engines refer to software that browse through the 
Internet for the queried information and provide sites, which contain the 
concerned information within a few seconds. They operate automatically 
and collect words available on a vast number of web pages. A researcher 
needs to understand and learn the technique of effective utilization of a 
search engine. He/She should also be aware regarding the evaluation of the 
results that the search engines provide. The most popular search engines 
used all over the world today are Google and Alta Vista. 


It is evident from the above discussion that with technological inventions, 
the nature of library research has changed tremendously. It has extended its scope 
and deals with vast range of information simultaneously. The researcher today, 
therefore, needs to familiarize himself/herself with various jargons and terminologies 
to get a good grasp of the research tools. The description that follows includes the 
various words commonly used in library research. 


Use of Library Resources 


There are innumerable information materials available in a library. In general, the 
prime sources of information can be classified and explained in the context of 
research usability as follows: 


e Books: Undoubtedly, books are the most significant and chief resource of 
information in any library. However, irrelevance can greatly hamper a book’s 
usability for the researcher. It is, therefore, very essential for a researcher to 
examine whether the book he/she is referring is relevant to his/her research 
study or not. A researcher should also check for the authority of the book, 
i.e., whether the author is an expert in the field or a well-known publishing 
house has published the book. Checking for the contemporariness will also 
facilitate the researcher to remain up-to-date in conducting his/her research 
study. 
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e Journals: Journals refer to a periodical collection of articles. They also 
include such sources like reports, bulletins or proceedings, published monthly 
by any organization or an institution. There are also certain scholarly journals, 
which are of great help to a researcher working in the field of humanities or 
social sciences. Usually, journals contain articles, which can offer thorough 
knowledge or up-to-date information regarding the subject matter of 
research. 


e Thesis: In simple terms, thesis refers to a piece of research work conducted 
by an individual to qualify for a degree. Usually, libraries often hold copies 
of thesis, which have been submitted and approved for Ph.D. A researcher 
can browse through such thesis to get an idea of the layout and presentation 
options for his/her own thesis. While making use of such thesis, the researcher 
should remember that they are copyright protected materials and for quoting 
from a thesis, it is essential to take a prior written consent from the concerned 
author. 


e Manuscripts and archives: These are unpublished, sometimes hand- 
written or typed original and primary sources of information. A study of 
these manuscripts gives the researcher an idea of the original study 
conducted in the past and thus guides him/her in conducting the research. 
Manuscripts can be used for citing examples to prove his/her points by 
referring to past occurrences and events. It is, however, absolutely 
important to preserve the integrity of such sources and, therefore, a 
researcher should repeat the original text exactly if required without any 
omissions, additions or corrections. 


It is absolutely important for the researcher to check the relevance of the 
concerned materials while referring anything from the mentioned sources in his/her 
research study. Usually, there is a general tendency to simply use what comes first 
at hand. Secondly, one is bound to believe anything when it is in print. However, a 
good research is not the result of such quick and simple adaptations. A researcher 
should give plenty of time to conduct the research study. He/She should examine 
every available source of information to its fullest usable degree. 


It is also necessary to check the accuracy, timeliness and depth of the content 
in the concerned source of information. Check out if the topic is being covered 
exhaustively by the book, or thesis or journal to which you have resorted. It is 
always advisable for the researcher to consider the target audience he/she is going 
to address and scrutinize. For example, aresearch that aims at conducting a study 
among the students of postgraduate courses should use refined and superior level 
of vocabulary. In that case, resorting to a book meant for high school students 
should be restricted till the extent of selecting basic ideas required for the research. 
However, it is obviously the researcher’s job to adapt his/her target audience and 
present the ideas in his/her own unique style. 


Ethics in Research 


NOTES 


Self-Instructional 
Material 


269 


Ethics in Research 


270 


NOTES 


Self-Instructional 
Material 


The researcher, while conducting a research, collects a large amount of 
data and stores them in the computer. However, it depends on the hardware and 
software capacity of the computer to store the information and data. The hardware 
and software capacity can be changed or managed as per the requirements of the 
organization. 


14.5.2 Uses of Internet in Research 


Before the evolution of the Internet, conducting a research work involved a set of 
encyclopaedias and a trip to library. However, now we live in an age where the 
information is easily accessible via computer using Internet. Today, information 
and data can be easily accessed with the help of the Internet. The Internet is the 
fastest developing and the largest repository of data. A researcher on the Internet 
can find information about any topic he/she desires. The Internet acts as a huge 
database of the content where a researcher can access an unlimited number of 
informative sources. 


Research itself is a very wide term. It means a systematic enquiry of the 
facts. There are various common applications of Internet research. One such 
application of the Internet research includes the personal research that is undertaken 
in order to enquire about a particular subject such as news or health problems. 
Various other applications of the Internet research also include research undertaken 
by the students for academic projects and papers, and writers and journalists 
researching stories. 


One of the advantages of conducting research using the Internet is that 
hundreds or thousands of pages can be found with some relation to the topic, 
within seconds, which is not possible if the same topic is to be searched from 
books or encyclopaedias. Moreover, the Internet also includes e-mail, online 
discussion forums and other communication facilities such as instant messaging 
and newsgroups that help the researchers have a direct access to the experts and 
other individuals with relevant knowledge and interests. 


There are various tools such as Internet search engine and Internet guide 
that a researcher can use for collecting the information. A search engine is an 
online database of Internet resources. When the researcher poses a query 
about a particular topic, the search engine looks for the likely matches within 
the database and displays the relevant content accordingly. Unlike, a standard 
search engine, the information that is contained within an Internet guide is 
compiled and organized by the humans, not computer programmes. 
Encyclopaedia Britannica is an example of Internet guide that covers a vast 
category of different topics. 


However, there is one disadvantage for the researchers in conducting a 
research with the help of the Internet. The disadvantage is that the majority of the 
content available on the Internet is self-submitted and there are few rules and 


regulations that a researcher has to adhere with regard to what a researcher can 
publish and what he/she cannot. Moreover, the content on the Internet may 
sometimes be inaccurate and opinion based. 


However, the Internet must not be disregarded as the major source of 
conducting research. It is one of the major sources of journals, books, general 
information and other relevant content. Therefore, we can say that the Internet is 
a very important source for the researchers in this modern age for the purpose of 
collecting information. 


Check Your Progress 


7. What is the importance of thesis in research? 


8. Give an example of internet guides. 


14.6 ANSWERS TO CHECK YOUR PROGRESS 
QUESTIONS 


1. Some of the international bodies which have provided well established 
and detailed ethical guidelines include the Social Research Association 
(SRA), the American Psychological Association (APA), the Council 
of American Survey Research Organizations (CASRO), the American 
Marketing Association (AMA), Business Marketing Association 
(BMA), etc. 


2. The most significant and important ethical concern ofa research study is the 
issue of trust and confidentiality. 

3. The control of quality and privacy are important checks ina study. 

4. The most important and vulnerable person in the research study is the 
respondent from whom the data is to be collected. 


5. The benefit of a mutual written or unwritten contract is that there is no 
unpleasantness or legal confrontation on either side. Another advantages of 
this is that in case a point was not very clear, the issue gets clarified. 


6. Professional creed refers to the overall conduct of the researcher, who has 
to be truthful during all phases of the study, whether in the conceptualization, 
conduction or presentation of the research study. 


7. Aresearcher can browse through the prevailing thesis to get an idea of the 
layout and presentation options for his/her own thesis. 


8. Unlike, a standard search engine, the information that is contained within an 
Internet guide is compiled and organized by the humans, not computer 
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programmes. Encyclopaedia Britannica is an example of Internet guide that 
covers a vast category of different topics. 


14.7 SUMMARY 


Ethics are extremely important in research and standard guidelines for this 
are available from different associations. 


The client must not use pressure to steer the results in the direction they 
want. 


The researcher has maximum responsibility in following a code to ensure 
quality of reporting, being transparent and yet maintain the privacy of both 
the client as well as the respondent. 


Utmost care must be taken at all times to protect the rights of the respondent. 


The researcher has to be absolutely transparent and objective while 
conducting and interpreting the research study results. 


A library research plan is a predefined activity that gives direction to your 
research. It is an act that involves evaluation that helps determine the 
subsequent activities to be followed by the researcher. As such, the research 
plan is a sequence of steps that the researcher should follow in order to get 
a comprehensible and reliable outline to adhere to. 


Once the researcher has executed the activities involved in the research 
plan, he/she should start looking for the essential and relevant information. 
This involves exploring information through traditional and modern library 
research tools, which contain specific bits of information as well as voluminous 
records and theories. 


Before the evolution of the Internet, conducting a research work involved a 
set of encyclopaedias and a trip to library. However, now we live in an age 
where the information is easily accessible via computer using Internet. Today, 
information and data can be easily accessed with the help of the Internet. 
The Internet is the fastest developing and the largest repository of data. A 
researcher on the Intemet can find information about any topic he/she desires. 
The Internet acts as a huge database of the content where a researcher can 
access an unlimited number of informative sources. 
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BMA: Business Marketing Association. 


Quality control: Maintaining the highest quality standards while conducting 
the research study 


Research ethics: A set of principles or guidelines that will assist the Ethics in Research 
researcher in making difficult research decisions and in deciding which goals 
are most important in reconciling conflicting values. 


Stakeholders of research: Client, researcher and the respondents NOTES 


Library research plan: A library research plan is a predefined activity that 
gives direction to your research. 


Library catalogue: A library catalogue is an informative list of resources 
and materials available in the library. It comprises the name of the book or 
journal, along with the concerned author names and also includes subject 
names and the name of the relevant publishing house. 


Search engines: Search engines refer to software that browse through the 
Internet for the queried information and provide sites that contain the 
concemed information, within a few seconds. 


14.9 SELF ASSESSMENT QUESTIONS AND 


EXERCISES 


Short-Answer Questions 


l. 


pas 
3. 
4. 
5. 


What are the three basic principles of professional ethics that any research 
must follow? 


Who are the three significant stakeholders involved in any research? 
What is study disclosure? 
Write a note on library catalogues. 


Explain the use of manuscripts and archives in research. 


Long-Answer Questions 


1. 


How can you follow an ethical path for conducting your research? Are 
there any guidelines available for this? Elaborate. 


2. Does the client also need to maintain certain ethical standards? Explain. 


3. What are the aspects that the researcher has to be careful about while 


conducting a study? 


. How do you follow an ethical practice while collecting information from the 


respondents? 


5. Discuss the various library research tools available to today’s researchers. 


6. Explain the importance of the Internet in research. 
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